[Xymon] Ongoing problem with multiple recovery notices

Larry Bonham larry at fni-stl.com
Wed Mar 2 19:20:01 CET 2016


Finally figured out what I was doing wrong.  RECOVERED was not part of the $STOP macro definition.  So it worked fine on alerts but failed to stop (sometimes) on the recovery and would continue to search for matching rules.

Both the script method and IGNORE work for me as they should now.

# $STOP=SCRIPT /paging/bin/xymon-ignore.sh none FORMAT=SCRIPT RECOVERED STOP

# Preferred since it is minimal.
$STOP=IGNORE RECOVERED

#---------------------------------------------------------------------------------------

HOST=* SERVICE=files
        $STOP

HOST=%test-server*
        MAIL admin1 at fni-stl.com REPEAT=60 RECOVERED COLOR=yellow,red
        $STOP

A question for J.C., I know that STOP and REPEAT=30 are automatically included in an IGNORE line.  Is there any reason that RECOVERED shouldn't automatically be there as well?  At least the way I'm using it there would never be an exception to that.  But maybe for others there would be.

Thanks.

Larry

RHEL 6.7
Xymon 4.3.26


-----Original Message-----
From: Larry Bonham
Sent: Friday, September 11, 2015 2:49 PM
To: xymon at xymon.com
Subject: Ongoing problem with multiple recovery notices with on a few tests

I apologize if this is a mostly duplicate message from earlier today.  It was too large due to an attachment.  And I had some incorrect information in the example configuration I provided.   This one is smaller and correct.

--------------------------------------------------------------------------------------------------------------------

I'm having an ongoing problem with multiple recovery notices.  I was really hoping that 4.3.21 would have fixed it but not so.

The alert picks up the correct rule and stops at line 467.  Recover hits that rule then continues on down the list.  It will then duplicate on a secondary matching rule line 482 and then on my catch all default rule line 618.  This isn't on all alert recoveries.  Mostly appears to only happen on "server" names that have an underscore or dash in it.  That may just be a coincidence.

Here is an example of recent email notices showing the problem.  It's really a small percentage of my total alerts that do this.  But that subset will do it consistently.  There has to be something different about that group.

>From                   Subject                                                                                                                                  Received

Xymon STL       Xymon [839283] qa_ccsic_ccs_red_alert:http CRITICAL (RED) [cfid:467]    7:11 AM

Xymon STL       Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:467]                               7:40 AM
Xymon STL       Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:482]                               7:40 AM
Xymon STL       Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:618]                               7:40 AM

I got a response from J.C. on this around 3/4/15.  He and I agreed that the problem is most likely in  lib/loadalerts.c.  Which is a pretty complicated piece of code.

ANYWAY, I wanted to see if anyone else is experiencing this and, if so, were you able to adjust or work around it.  It isn't a major problem.  Just an annoyance.

RHEL 6.6
Xymon 4.3.21

>From hosts.cfg

0.0.0.0 qa_ccsic_ccs_red_alert # noconn nosslcert https://qa.ccsic.fni-stl.com/cgi-bin/xxx_alert.pl

Relevant section from alerts.cfg

### My email results represent admin3 at fni-stl.com

### macro to stop further rule checking.  Also tried IGNORE.  Same results.

   180  $STOP=SCRIPT xymon-ignore.sh none FORMAT=SCRIPT STOP

   465  PAGE=%url/CCS HOST=%(qa|test|launch)_ccsic_ccs.*_(redalert|red_alert) EXSERVICE=sslcert
   466         MAIL admin1 at fni-stl.com DURATION>2 REPEAT=60 RECOVERED
   467         MAIL admin3 at fni-stl.com DURATION>2 REPEAT=60 RECOVERED
   468         $STOP             <-- when alerting on fail it always stops here.  But recovery notices keep going.

### catch all rule for the url page and not handled above.

   480  PAGE=%url/.* EXSERVICE=sslcert
   481        MAIL admin2 at fni-stl.com DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red
   482        MAIL admin3 at fni-stl.com DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red
                  ...  Email other users.  Line format identical to above.
   490        SCRIPT xymon-page.sh grp1 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red
   491        SCRIPT xymon-page.sh grp3 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red,purple
   492        $STOP

### catch all rule for anything not handled above.

   616  HOST=*
   617        MAIL admin1 at fni-stl.com REPEAT=1440 RECOVERED COLOR=yellow,red
   618        MAIL admin3 at fni-stl.com REPEAT=1440 RECOVERED COLOR=yellow,red
   619        SCRIPT xymon-page.sh grp3 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red
   620        $STOP


Larry B.





________________________________

CONFIDENTIALITY NOTICE:
This electronic mail message is intended exclusively for
recipient to which it is addressed. The contents of this message
and any attachments may contain confidential and privileged
information. Any unauthorized review, use, print, storage, copy,
disclosure or distribution is strictly prohibited. If you have
received this message in error, please advise the sender
immediately by replying to the message's sender and delete all
copies of this message and its attachments without disclosing
the contents to anyone, or using the contents for any purpose.



More information about the Xymon mailing list