[hobbit] alerts still not alerting

Henrik Stoerner henrik at hswn.dk
Sun Mar 20 14:23:16 CET 2005


On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
> I'm still flummoxed by hobbit-alerts.  I'm certain I broke something,
> because I am not getting any alerts from the box.

It's probably a config error ... 

> The only logs in /var/log/hobbit/page.log are 
> 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
> 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument

These are harmless, and often occur when Hobbit is shutdown or
restarted.

> I see a couple of those in the hobbitlaunch.log file as well, I also see
> the following error:
> 2005-03-19 10:14:21 Task bbdisplay started with PID 7417
> 2005-03-19 10:14:21 Task bbretest started with PID 7418
> 2005-03-19 10:14:29 Our child has failed and will not talk to us
> 2005-03-19 10:14:36 Our child has failed and will not talk to us

That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.

> Here is a sample host that is not paging.  The info page lists:
> Service Recipient 1st Delay Stop after Repeat Time of Day Colors 
> conn dan.mcdonald at austinenergy.com (R) 30m  - 5d  - red 
> telnet dan.mcdonald at austinenergy.com (R) 30m  - 5d  - red
> 
> Both telnet and conn have been down on this host for over two hours.
> 
> The salient rule is:
> HOST=%.
>         MAIL=dan.mcdonald at austinenergy.com REPEAT=140h DURATION>30m
> RECOVERED COLOR="red" UNMATCHED

Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"

So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.

Could you try running

   exec ~hobbit/server/bin/bbcmd
   hobbitd_alert --test HOSTNAME conn "" 120 red

That should tell you how the alert is handled, and who gets notified
using what rules.


Regards,
Henrik



More information about the Xymon mailing list