[Xymon] Alert REPEAT not working in 4.3.15.

henrik at hswn.dk henrik at hswn.dk
Mon Feb 10 10:22:03 CET 2014


Den 2014-02-10 8:18, Johan Sjöberg skrev:

> A while ago, we upgraded to 4.3.15. It seems like the alert repeat
> setting isn't working, only the first alert is sent. We have an 
> on-call
> person that receives the first alert via SMS after 7 minutes. It 
> should
> then repeat every 15 minutes. The rest of the team gets their first 
> alert
> after 22 minutes.

[snip config]

> From the notification log:
>
> Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123)
> alarms at domain.tld 1392007395 0
>
> Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111
> 1392007875 0
>
> Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222
> 1392008717 0
>
> Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333
> 1392008717 0
>
> Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444
> 1392008717 0
>
> Strangely though, it seems like it was working on Feb 5, which was 
> also
> after the upgrade. The only change done since then is the patch for
> xymonnet, and don't see how this could affect the alerts?

There are no changes to how alerts work in neither 4.3.15 or 4.3.16.

I copied your configuration into a 4.3.16 system, and REPEAT is working 
fine here:

$ tail -f notifications.log
Mon Feb 10 09:39:58 2014 webmail.hswn.dk.conn (0.0.0.0) root[3] 
1392021598 500
Mon Feb 10 09:46:16 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 
1392021976 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 
1392022917 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 
1392023826 500

(my "root" recipient is your first recipient, the "root-X" are your 
"11111", "22222" etc. recipients).

You didn't list the history log for the web01.apache2 service. Are you 
sure that it was red all of the time? Any green status will reset the 
REPEAT interval, this could explain why you don't see it.

Running xymond_alert with the "--debug" option will log a lot of data 
about how alert messages are handled. It would be nice to have this if 
the problem re-occurs.


Regards,
Henrik




More information about the Xymon mailing list