[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] alerts still not alerting
On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
> I'm still flummoxed by hobbit-alerts. I'm certain I broke something,
> because I am not getting any alerts from the box.
It's probably a config error ...
> The only logs in /var/log/hobbit/page.log are
> 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
> 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or
restarted.
> I see a couple of those in the hobbitlaunch.log file as well, I also see
> the following error:
> 2005-03-19 10:14:21 Task bbdisplay started with PID 7417
> 2005-03-19 10:14:21 Task bbretest started with PID 7418
> 2005-03-19 10:14:29 Our child has failed and will not talk to us
> 2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.
> Here is a sample host that is not paging. The info page lists:
> Service Recipient 1st Delay Stop after Repeat Time of Day Colors
> conn dan.mcdonald (at) austinenergy.com (R) 30m - 5d - red
> telnet dan.mcdonald (at) austinenergy.com (R) 30m - 5d - red
>
> Both telnet and conn have been down on this host for over two hours.
>
> The salient rule is:
> HOST=%.
> MAIL=dan.mcdonald (at) austinenergy.com REPEAT=140h DURATION>30m
> RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"
So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.
Could you try running
exec ~hobbit/server/bin/bbcmd
hobbitd_alert --test HOSTNAME conn "" 120 red
That should tell you how the alert is handled, and who gets notified
using what rules.
Regards,
Henrik