[hobbit] system log and application log monitoring

Sun Jun 4 10:04:44 CEST 2006

On Fri, Jun 02, 2006 at 11:03:52AM -0500, Jeff Newman wrote:
> 
> Is there a facility already in place, or a way to graph the number of "hits"
> returned by a pattern match for a log file?
> 
> For instance:
> 
> I am checking xyz log file for the word "wrap" It would be *very* useful to 
> have a graph that shows the number of times that word showed up between the 
> previous check and the current check.

No, there isn't.

> This could be very useful to illustrate, say, a disk dying (one blip
> of a bad read or something would be one thing, but looking at a graph
> over time that shows 1 blip one week, 10 the next, and 20 the week
> after that would indicate the disk was almost dead) etc...

Hobbit only looks at log entries over a 30-minute period, so we would
have to extend that significantly. So this would have to be done at the
client side rather than on the server. (Not a problem, I'm just thinking
out loud). 

> Right now, the only way I have to do this is with a client side script that
> runs in a constant loop:
> 
> while true; do
>   NUM=`grep "Buffer wrapped" /quotes/env/errlog | wc -l | sed 's/  *//g'`
>   if [ $NUM -gt $INITIALNUM ] ; then
>      WRAP_NUM=`expr $NUM - $INITIALNUM`
>      $BB $BBDISP "status $MACHINE.wraps green `date`
>      `echo "wraps:$WRAP_NUM"`
>      "
>      INITIALNUM=$NUM
>   else
>      OKNUM=0
>      $BB $BBDISP "status $MACHINE.wraps green `date`
>      `echo "wraps:$OKNUM"`
>      "
>   fi

If all that you want is the graph and not alerts, then I wonder if it 
couldn't be done more easily. Just do the "grep" and report the number 
like you do now. Then send it into the NCV handler, with a dataset 
definition that uses the DERIVE datatype (which is the default, btw). 
Then RRDtool should handle all of the "subtract current value from 
previous value if it's greater, else ..." stuff and you needn't 
worry about it.

.....

After thinking a bit more about this, I believe that having a method to
do "grep ...| wc -l" in the client might be a good thing. So I've added
a new type of configuration the the client-local.cfg file, so you can do

    linecount:/var/log/messages
    diskerrors I/O error.*/dev/hd
    badlogins Login failed

and it will report back in the client message the data

   diskerrors: 0
   badlogins: 2

which are the number of times these two expressions were found in the
/var/log/messages file.

Given those data, on the server side it will be easy to feed them into
a graph and do other nice things with it.

Regards,
Henrik