[Xymon] Alerts do not seem to be triggering or otherwise executing my script

Chris Allen chris.allen at affinesystems.com
Thu Jul 21 23:39:59 CEST 2011


Seemingly the same problem as this guy:

http://lists.xymon.com/archive/2008-June/019952.html

I've crawled the hell out of the official Xymon documentation as well
as the wikibook, I can't find anything that tells me what's going on.

There is an alert that should have been triggering (It's covered by
the time displayed in the log output below). The shutdown message is
me running /etc/init.d/hobbit restart in an attempt to make it aware
of the problem. (Xymon is currently red)

Is the hobbitd_alert test/debug command supposed to actually execute
the script/MAIL rule? Knowing this would help me a little, but not a
whole lot as the alert doesn't seem to be otherwise triggering anyway.

I am quite puzzled and unable to otherwise intelligently test or debug
this, doubly disappointing is that I'm rather fond of this monitoring
system and would rather not be forced to discard it because of a lack
of documentation and abundance of inexplicable behavior.

I have tested running my script standalone but with identical context
to what would happen if hobbit executed it, and it worked. I also have
debugging commands in the script to let me know if it gets executed at
all. It never does during a --test nor naturally as it should be given
the currently red status of one of my tests. Disregard that I am not
utilizing $RCPT, I eliminated that to reduce the change it could be
something unexpected with variable. It wasn't, the script just isn't
getting executed.

What am I missing?

Log and command output follows this line:

/etc/hobbit/scripts/alert.sh
#!/bin/sh

echo `set` > /tmp/envvars
/data/software/aws/dist/current/sns/bin/sns-publish
"arn:aws:sns:us-east-1:871321084716:ops" --subject "Server alert from
Hobbit/Xymon" --message "$BBALPHAMSG" > /tmp/debug-alerts 2>&1

hobbit-alerts.cfg

HOST=* COLOR=red
       SCRIPT /etc/hobbit/scripts/alert.sh "ops" REPEAT=1h

tail -10 page.log

2011-07-21 14:15:01 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:17:01 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:17:10 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:19:10 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:20:02 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:20:02 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:20:13 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:22:13 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:22:45 Tried to down BOARDBUSY: Invalid argument
2011-07-21 14:22:45 Got a shutdown message
2011-07-21 14:22:45 Terminated by signal 15
2011-07-21 14:23:09 Peer not up, flushing message queue
2011-07-21 14:23:09 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:25:03 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:25:03 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:26:12 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:27:46 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:28:46 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:29:15 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:29:15 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg

Running bbcmd hobbitd_alert --debug --test live proc (live is a real
server/hostname and proc is a test criteria we have) returns:
dev:/var/log/hobbit# bbcmd hobbitd_alert --debug --test live proc

2011-07-21 14:31:21 Using default environment file
/usr/lib/hobbit/client/etc/hobbitserver.cfg
2011-07-21 14:31:21 Opening file /usr/lib/hobbit/server/etc/bb-hosts
2011-07-21 14:31:21 Opening file /usr/lib/hobbit/server/etc/hobbit-alerts.cfg
2011-07-21 14:31:21 Opening file /usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:31:21 Cannot open configuration file
/usr/lib/hobbit/server/etc/hobbit-holidays.cfg
2011-07-21 14:31:21 send_alert live:proc state 0
00007255 2011-07-21 14:31:21 send_alert live:proc state Paging
00007255 2011-07-21 14:31:21 Matching host:service:page 'live:proc:'
against rule line 122
00007255 2011-07-21 14:31:21 *** Match with 'HOST=* COLOR=red' ***
2011-07-21 14:31:21 Found a first matching rule
00007255 2011-07-21 14:31:21 Matching host:service:page 'live:proc:'
against rule line 122
00007255 2011-07-21 14:31:21 *** Match with 'HOST=* COLOR=red' ***
2011-07-21 14:31:21   repeat live|proc|script|"ops" at 0
2011-07-21 14:31:21   Alert for live:proc to "ops"
00007255 2011-07-21 14:31:21 Script alert with command
'/etc/hobbit/scripts/alert.sh' and recipient "ops"
2011-07-21 14:31:21 No more secondary matching rule

dev:/var/log/hobbit# tail ./notifications.log

Thu Jul 21 12:22:01 2011 nfs-1.procs (75.101.157.144)
chris.allen at affinesystems.com[123] 1311276121 300
Thu Jul 21 12:52:31 2011 live.conn (75.101.158.206)
engineeering at affinesystems.com[122] 1311277951 500
Thu Jul 21 12:52:31 2011 live.conn (75.101.158.206)
chris.allen at affinesystems.com[123] 1311277951 500
Thu Jul 21 12:52:31 2011 nfs-1.procs (75.101.157.144)
engineeering at affinesystems.com[122] 1311277951 300
Thu Jul 21 12:52:31 2011 nfs-1.procs (75.101.157.144)
chris.allen at affinesystems.com[123] 1311277951 300
Thu Jul 21 13:23:01 2011 live.conn (75.101.158.206)
ops at affinesystems.com 1311279781 500
Thu Jul 21 13:23:01 2011 nfs-1.procs (75.101.157.144)
ops at affinesystems.com 1311279781 300
Thu Jul 21 13:50:01 2011 ads2.procs (68.168.100.16)
arn:aws:sns:us-east-1:871321084716:ops 1311281397 300
Thu Jul 21 13:52:09 2011 ads2.procs (68.168.100.16)
"arn:aws:sns:us-east-1:871321084716:ops" 1311281521 300
Thu Jul 21 13:59:58 2011 ads4.procs (68.168.100.18) "ops" 1311281995 300

dev:/var/log/hobbit# tail ./rrd-data.log

2011-07-21 13:42:27 Cache flush completed
2011-07-21 13:42:37 Peer not up, flushing message queue
2011-07-21 14:04:33 Tried to down BOARDBUSY: Invalid argument
2011-07-21 14:04:33 Shutting down, flushing cached updates to disk
2011-07-21 14:04:33 Cache flush completed
2011-07-21 14:04:42 Peer not up, flushing message queue
2011-07-21 14:22:45 Tried to down BOARDBUSY: Invalid argument
2011-07-21 14:22:45 Shutting down, flushing cached updates to disk
2011-07-21 14:22:45 Cache flush completed
2011-07-21 14:22:53 Peer not up, flushing message queue


dev:/var/log/hobbit# tail ./history.log

2011-07-21 14:04:42 Peer not up, flushing message queue
2011-07-21 14:22:51 Peer not up, flushing message queue
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/ads2.msgs -
color unchanged (purple)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/ads2.ssh -
color unchanged (purple)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/ads3.msgs -
color unchanged (purple)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/ads4.msgs -
color unchanged (purple)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/dev.files -
color unchanged (purple)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/dev.ports -
color unchanged (purple)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/live.msgs -
color unchanged (clear)
2011-07-21 14:32:47 Will not update /var/lib/hobbit/hist/vcr1.msgs -
color unchanged (purple)



More information about the Xymon mailing list