[xymon] Managing who gets alerts - shifts and rotations

Tim McCloskey tm at freedom.com
Sat Oct 9 21:49:47 CEST 2010


I forgot to mention that using the SCRIPT directive like I did will cause the 'info' tab to be full of alerting info that is not really helpful, depending on how tou view that sort of thing.  A workaround for me was to just take advantage of the 'notes' dir.


________________________________________
From: Tim McCloskey [tm at freedom.com]
Sent: Saturday, October 09, 2010 12:28 PM
To: xymon at xymon.com
Subject: RE: [xymon] Managing who gets alerts - shifts and rotations

Hi,

Might not be what you were hoping to hear but I'm going to share it just the same.

If you think through your rules and come up with a standard format it will help.  I know it seems endless but once you've set up a standard it's not as bad as you think.  Just tedious in the initial setup, but worth it in the long run.

One of the things that I did was to seperate the email/pager addresses from the actual alert rules.  Example below.

On NN schedule cron runs a simple perl pie script to change the values in mail-primary.sh (and page-primary.sh, etc).  This is an extra layer that you need to come up with yourself but it's not overly complex.  You may end up with 4 or nn more perl scripts but you maintain the variables in those scripts outside of the hobbit system, thus avoiding the typo in your alerts.cfg that breaks things.  (The variables will likely be a list or array of people/pagers/email.  It's likely that the people will change from time to time but the alert for ICMP on your nameserver will always be required.)

 cat hobbit-alerts.cfg
...
$alertdir=/usr/local/tolkien/server/alert-scripts/sys_admin
$alertdir2=/usr/local/tolkien/server/alert-scripts/dev_app
include /usr/local/tolkien/server/etc/inc/alerts/tm-mu
...

 cat tm-mu
...
## ALL OTHER SERVICES :  SERVERS ON WEB SIDE
# mail primary on every red level service failure that has been red for over 6 minutes.
# send mail once an hour and do not send a recovery email.
# this excludes conn, for which they have already been paged.
# mail secondary after an hour and once an hour thereafter.
#
#
PAGE=%^web/(linux|other|windows|solaris) EXSERVICE=conn COLOR=red,purple DURATION>6m
SCRIPT $alertdir/tm-mu/mail-primary.sh mail-web-prim FORMAT=sms REPEAT=1h

PAGE=%^web/(linux|other|windows|solaris) EXSERVICE=conn COLOR=red,purple DURATION>1h
SCRIPT $alertdir/tm-mu/mail-secondary.sh mail-web-sec FORMAT=sms REPEAT=1h
...

cat mail-primary.sh:

!/bin/bash
/bin/mail -s "$BBHOSTSVC" tm at f...redacted....com < /dev/null


I can't share the complete perl script today but it's failry simple, example stanza.

...
elif $GREP $mailb $wd/$mp > /dev/null

        then
                $PERL -p -i -e "s:$mailb:$maila:" $wd/$mp
                $PERL -p -i -e "s:$maila:$mailb:" $wd/$ms

        else `$MAIL -s 'failed on call change' $sysadmin < $wd/$msg`
fi
...



I know it seems like you're banging your head on the wall looking for simplicity, that's the part you may need to create.  If something already exists I'm sure someone on the group will let you know.

Good luck.

-t






More information about the Xymon mailing list