[Xymon] possible xymon 4.3.21 holiday alerting bug?

Gavin Stone-Tolcher g.stone-tolcher at its.uq.edu.au
Fri Oct 9 03:15:52 CEST 2015


> Hmm. Does the REPEAT value work with a smaller interval (such as 1d or 1h)? And what type of system are you running on?
> I'm curious if there's a REPEAT over/underflow going on instead of something specific to the TIME exclusion back and forth.

All the alerting rules we use have a "REPEAT=1w", and they do seem to work as intended during non holiday times.
I simulated "REPEAT=1h" on our alternate production server (same config/polling/clients but no alerting), and it exhibited the same behaviour.

The system is Oracle Linux 6, which as I understand it, is really a RHEL 6 variant. We are running vanilla 4.3.21 compiled from source, not the rpm version.

# uname -a
Linux xx.yy.edu.au 2.6.32-504.30.3.el6.x86_64 #1 SMP Tue Jul 14 08:51:44 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.6 (Santiago)
# cat /etc/oracle-release 
Oracle Linux Server release 6.6

> Is the test persistently red with no spurious recoveries being generated during the period in question?

Test is hard red in history.


Cheers,
Gavin Stone-Tolcher, IT Support Officer, Network Operations and Incident Response
Information Technology Services
The University of Queensland
Level 4, Prentice Building, St Lucia 4072
T: +61 7 334 66645, M: +61 401 140 838
E: g.stone-tolcher at its.uq.edu.au W: www.its.uq.edu.au

ITS: Service. Team. Accountability. Results.

IMPORTANT: This email and any attachments are intended solely for the addressee(s), contain copyright material and are confidential. We do not waive any legal privilege or rights in respect of copyright or confidentiality. Except as intended addressees are otherwise permitted, you do not have permission to use, disclose, reproduce or communicate any part of this email or its attachments. Statements, opinions and information not related to the official business of The University of Queensland are neither given nor endorsed by us. By using this email (including accessing any attachments or links) you agree we are not liable for any loss or damage of any kind arising in connection with any electronic defect, virus or other malicious code we did not intentionally include.

Please consider the environment before printing this email.

CRICOS Code 00025B

-----Original Message-----
From: J.C. Cleaver [mailto:cleaver at terabithia.org] 
Sent: Friday, 9 October 2015 1:14 AM
To: Gavin Stone-Tolcher <g.stone-tolcher at its.uq.edu.au>
Cc: xymon at xymon.com
Subject: Re: [Xymon] possible xymon 4.3.21 holiday alerting bug?



On Wed, October 7, 2015 11:58 pm, Gavin Stone-Tolcher wrote:
> Hi, We are seeing unusual alerting behaviour with Xymon 4.3.21 server 
> using a "holidays.cfg"  with HOLIDAYLIKEWEEKDAY=0.
>
> We have a network operations team (uqnoc-sms) that gets alerts during 
> business hours (TIME=W:0800:1700) And a data networks team (dn-sms) 
> that get out of business hours alerts in certain windows 
> (TIME=W:0600:0759,W:1701:2200,60:0600:2200)
>
> Rules are like:
>
> PAGE=$UNSMSREGEX EXHOST=$UNEXCLUDE
>         MAIL uqnoc-sms at xx.yy.edu.au SERVICE=$UNSMSSVCS DURATION>6m
> TIME=W:0800:1700 COLOR=red REPEAT=1w FORMAT=SMS RECOVERED
>         MAIL dn-sms at xx.yy.edu.au SERVICE=$UNSMSSVCS DURATION>6m
> TIME=W:0600:0759,W:1701:2200,60:0600:2200 COLOR=red REPEAT=1w 
> FORMAT=SMS RECOVERED
>
> For a "red" conn test covered by the rule on a weekday public holiday, 
> it seems to correctly identify not to send an alert to "uqnoc-sms"
> (TIME=W:0800:1700 ) and instead correctly generates an alert to "dn-sms"
> (TIME=60:0600:2200 component), but then keeps sending the same alert 
> approximately every minute (my xymonnet poll cycle). Ignores REPEAT=1w?
>
> Before I try and debug much further, I thought I would ask if anyone 
> else has seen similar behaviour?

Hmm. Does the REPEAT value work with a smaller interval (such as 1d or 1h)? And what type of system are you running on?

I'm curious if there's a REPEAT over/underflow going on instead of something specific to the TIME exclusion back and forth.

Is the test persistently red with no spurious recoveries being generated during the period in question?


-jc





More information about the Xymon mailing list