xymon_4.3.0-RC1: possible lost alerts

Dominique Frise dominique.frise at unil.ch
Fri Feb 11 18:04:20 CET 2011


Hi,

I think I found a bug in xymond_alert.c.

Lets say there is a page msg for hostA.serviceA and this alert will not 
be processed immediately because of this part of code:

    816                  /*
    817                   * When a burst of alerts happen, we get lots 
of alert messages
    818                   * coming in quickly. So lets handle them in 
bunches and only
    819                   * do the full alert handling once every 10 
secs - that lets us
    820                   * combine a bunch of alerts into one 
transmission process.
    821                   */
    822                  if (nowtimer < (lastxmit+10)) continue;
    823                  lastxmit = nowtimer;


The main loop will then wait for a new msg from xymond (Want msg <num>, 
startpos... etc).

Now if the next msg is a page recovery from the same hostA.serviceA,
the next processing of the active alerts (for loop) will then cleanup 
the alert for hostA.serviceA without sending any alert.


Dominique



More information about the Xymon mailing list