[hobbit] alerts still not alerting

McDonald, Dan Dan.McDonald at austinenergy.com
Mon Mar 21 16:27:46 CET 2005


I tried a couple of these, and it says it's sending mail to me, but there is
nothing in the log...

Ah wait, here's something in the log: postfix got munged when an updated
mailman rpm was loaded on the box.  But it should have still queued the
message.

I'll see if anything goes down today.  Probably will...
-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk]
Sent: Sunday, March 20, 2005 7:23 AM
To: hobbit at hswn.dk
Subject: Re: [hobbit] alerts still not alerting


On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
> I'm still flummoxed by hobbit-alerts.  I'm certain I broke something,
> because I am not getting any alerts from the box.

It's probably a config error ... 

> The only logs in /var/log/hobbit/page.log are 
> 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
> 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument

These are harmless, and often occur when Hobbit is shutdown or
restarted.

> I see a couple of those in the hobbitlaunch.log file as well, I also see
> the following error:
> 2005-03-19 10:14:21 Task bbdisplay started with PID 7417
> 2005-03-19 10:14:21 Task bbretest started with PID 7418
> 2005-03-19 10:14:29 Our child has failed and will not talk to us
> 2005-03-19 10:14:36 Our child has failed and will not talk to us

That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.

> Here is a sample host that is not paging.  The info page lists:
> Service Recipient 1st Delay Stop after Repeat Time of Day Colors 
> conn dan.mcdonald at austinenergy.com (R) 30m  - 5d  - red 
> telnet dan.mcdonald at austinenergy.com (R) 30m  - 5d  - red
> 
> Both telnet and conn have been down on this host for over two hours.
> 
> The salient rule is:
> HOST=%.
>         MAIL=dan.mcdonald at austinenergy.com REPEAT=140h DURATION>30m
> RECOVERED COLOR="red" UNMATCHED

Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"

So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.

Could you try running

   exec ~hobbit/server/bin/bbcmd
   hobbitd_alert --test HOSTNAME conn "" 120 red

That should tell you how the alert is handled, and who gets notified
using what rules.


Regards,
Henrik

To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk




More information about the Xymon mailing list