[hobbit] system log and application log monitoring
Henrik Stoerner
henrik at hswn.dk
Sun Jun 4 10:04:44 CEST 2006
On Fri, Jun 02, 2006 at 11:03:52AM -0500, Jeff Newman wrote:
>
> Is there a facility already in place, or a way to graph the number of "hits"
> returned by a pattern match for a log file?
>
> For instance:
>
> I am checking xyz log file for the word "wrap" It would be *very* useful to
> have a graph that shows the number of times that word showed up between the
> previous check and the current check.
No, there isn't.
> This could be very useful to illustrate, say, a disk dying (one blip
> of a bad read or something would be one thing, but looking at a graph
> over time that shows 1 blip one week, 10 the next, and 20 the week
> after that would indicate the disk was almost dead) etc...
Hobbit only looks at log entries over a 30-minute period, so we would
have to extend that significantly. So this would have to be done at the
client side rather than on the server. (Not a problem, I'm just thinking
out loud).
> Right now, the only way I have to do this is with a client side script that
> runs in a constant loop:
>
> while true; do
> NUM=`grep "Buffer wrapped" /quotes/env/errlog | wc -l | sed 's/ *//g'`
> if [ $NUM -gt $INITIALNUM ] ; then
> WRAP_NUM=`expr $NUM - $INITIALNUM`
> $BB $BBDISP "status $MACHINE.wraps green `date`
> `echo "wraps:$WRAP_NUM"`
> "
> INITIALNUM=$NUM
> else
> OKNUM=0
> $BB $BBDISP "status $MACHINE.wraps green `date`
> `echo "wraps:$OKNUM"`
> "
> fi
If all that you want is the graph and not alerts, then I wonder if it
couldn't be done more easily. Just do the "grep" and report the number
like you do now. Then send it into the NCV handler, with a dataset
definition that uses the DERIVE datatype (which is the default, btw).
Then RRDtool should handle all of the "subtract current value from
previous value if it's greater, else ..." stuff and you needn't
worry about it.
.....
After thinking a bit more about this, I believe that having a method to
do "grep ...| wc -l" in the client might be a good thing. So I've added
a new type of configuration the the client-local.cfg file, so you can do
linecount:/var/log/messages
diskerrors I/O error.*/dev/hd
badlogins Login failed
and it will report back in the client message the data
diskerrors: 0
badlogins: 2
which are the number of times these two expressions were found in the
/var/log/messages file.
Given those data, on the server side it will be easy to feed them into
a graph and do other nice things with it.
Regards,
Henrik
More information about the Xymon
mailing list