[Xymon] Tricky one for log file monitoring

henrik at hswn.dk henrik at hswn.dk
Thu Mar 22 12:11:01 CET 2012


On Thu, 22 Mar 2012 10:41:09 -0000, "Neil Simmonds"
<Neil.Simmonds at express-gifts.co.uk> wrote:
> Message appears in log file for failure - from this we want an alert
> that will stay active and not expire after 30 minutes like log file
> alerts usually do.
> 
> We will hopefully then get a message in the log file that tells us of
> completion of the failed process, at this point we want to clear the
> alert.

It's not something that the Xymon client will do automatically, but you
can script your way out of it. What I would do is to create a custom test
for this - something like this:

#!/bin/sh

# Logfile we monitor
FN="/var/log/mylogfile"
# Message patterns that say "alert" or "OK"
ALERTMSG="Something bad"
OKMSG="All OK"

# Use the data from the "logfetch" status to grab the last 5 minutes of
log data
FPOS=`cat $XYMONTMP/logfetch.${MACHINEDOTS}.status | grep "^${FN}:" | cut
-d: -f2`
LASTMSG=`dd if=$FN bs=1 skip=$FPOS 2>/dev/null | egrep "$ALERTMSG|$OKMSG"
| tail -n 1`

# LASTMSG now holds the last message which is either an alert or an OK
message
#
# Actually the whole "cat ... grep ... cut ... dd .." thing is not needed,
since 
# you could just scan the entire logfile and pick out the last message
which is 
# either OK or alert... you could just do
# LASTMSG=`egrep "$ALERTMSG|$OKMSG" $FN | tail -n 1`

# Determine color
COLOR="green"
if test `echo "$LASTMSG" | grep -c "$ALERTMSG"` -ne 0
then
   COLOR=red
fi

# Send the status with a very long duration so it doesnt go purple.
$XYMON $XYMSRV "status+365d $MACHINE.mylog $COLOR `date`

Last message seen: $LASTMSG
"

exit 0


This raises two interesting ideas:

1) We should have status-messages that don't expire (go purple). Using a
very long status lifetime is a kludge, really.
2) The log analysis tool should know how to handle messages that cancel
each other out.


Regards,
Henrik




More information about the Xymon mailing list