[hobbit] Hobbit 4.0.4 released - Alert Script Issue

Peter Welter peter.welter at gmail.com
Sat Aug 27 12:10:08 CEST 2005


Hello Henrik,

> two things we can do:
> 
> 1) Add "--checkpoint-file=$BBTMP/alert.chk --checkpoint-interval=600" to
>    the hobbitd_alert command in hobbitlaunch.cfg. That way it will
>    remember all active alerts when you restart Hobbit.
I'll do that asap (coming monday). That will certainly resolve this issue.

> 2) When a new alert was first seen (also after a restart of Hobbit), the
>    duration was reset to 0 - instead of using the information Hobbit
>    already had about when the status change occurred. I've changed this
>    in the code, so that it picks up the duration of the alert from the
>    timestamp we keep for when the last status change happened.
Ok, but that usefull addition is for new/coming releases.

However, I think I found out why the entire problem showed up in the
first place. I had a alert-config that first mailed on an occuring
event and if that was not dealt with properly, ran a pager script 20
minutes later. After an evening of applying (OS-)patches, a reboot
etc. it did not work anymore. Eventually I thought that it had to do
with a alert-config modification, resulting in this
email-conversation.

As suggested, I checked the alerttrace.log, but could not find a
reason why this problem happened (I changed pagerscript to mail, but
no result). It *does* worked fine when *all* the alerts are processed
at the same time!

Exploring the mailinglist and Changes-file for each version, I think
it can be brought down to a known bug in Hobbit that is to be fixed in
4.1.2; see my mail from August 19th, 11:42.

Since we are running 4.0.4, I'm thinking what is a wise thing to do?
The workaround does work fine now (we are a 24*7 University), I
thinking to wait untill 4.1.2 reaches the Stable status, since 4.1.1
does not solve this particular bug.

Regards, Peter



More information about the Xymon mailing list