[Xymon] False http alerts and delay in notifications

Johan Sjöberg johan.sjoberg at deltait.se
Wed May 20 10:49:17 CEST 2015


Hi,

We are having some problems with our Xymon server. It has happened a number of times, but with long time between (> 1 month). It seems to happen while backup is running on the server, and I am trying to find out how this affects Xymon.
It starts with all http tests going red, and recovering after a few minutes. But the most strange part is that notifications are not being sent out until several minutes later. There are also SMS notifications being sent out, even though there is a delay of 7 minutes for SMS, and the alerts were only active for about 2 minutes. I can see in notifications.log that the delay is in Xymon and not in the mail or SMS applications.
I do not have any deeper knowledge in the internal Xymon functions, but it seems like the "alert engine" is somehow stuck in a state where there are a number of active alerts, and it sends out alerts for tests that have already been ok for several minutes.
I also need to find out what causes all the http tests to go red at the same time. I have ruled out external factors like network, but it might be related to load on the Xymon server.

The last time this happened was last night. The http tests went red at 00:08 and then green at 00:10, but no notifications were sent out until 00:19.

This entry in xymongen.log might be related:
2015-05-20 00:10:45 WARNING: Runtime 107 longer than TASKSLEEP (60)

One more thing that is strange is that if I generate an event log report from the GUI, I can only see the recoveries from red to green, and not the alerts from green to red.

Regards,
Johan



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150520/e047eadc/attachment.html>


More information about the Xymon mailing list