[xymon] xymon_4.3.0-RC1: possible lost alerts
Henrik Størner
henrik at hswn.dk
Mon Feb 14 11:00:38 CET 2011
In <4D556C14.5060207 at unil.ch> Dominique Frise <dominique.frise at unil.ch> writes:
>I think I found a bug in xymond_alert.c.
>Lets say there is a page msg for hostA.serviceA and this alert will not
>be processed immediately because of this part of code:
> 816 /*
> 817 * When a burst of alerts happen, we get lots of alert messages
> 818 * coming in quickly. So lets handle them in bunches and only
> 819 * do the full alert handling once every 10 secs - that lets us
> 820 * combine a bunch of alerts into one transmission process.
> 821 */
> 822 if (nowtimer < (lastxmit+10)) continue;
> 823 lastxmit = nowtimer;
>The main loop will then wait for a new msg from xymond (Want msg <num>,
>startpos... etc).
>Now if the next msg is a page recovery from the same hostA.serviceA,
>the next processing of the active alerts (for loop) will then cleanup
>the alert for hostA.serviceA without sending any alert.
I haven't tested your diagnosis, but it is probably correct
(from how I remember that this code works).
But is it a problem ?
If you get an alert that clears a few seconds later (that is why there
is a recovery message), then what is the point of sending an alert ?
The notification would be for data that is no longer valid, and
personally I would rather NOT be alerted a 3 AM if the problem no
longer exists.
So I am tempted to invoke the old "this is not a bug, it's a feature!"
meme :-)
Regards,
Henrik
More information about the Xymon
mailing list