RECOVERED flag in hobbit-alerts.cfg question

Brodie, Kent brodie at mcw.edu
Wed Jul 26 03:09:55 CEST 2006


Hi all!

As I expand my alerting ruleset to do cool things like only page me
about "printers" during the day and so on, I had a weird occurrence
today.

Basically, I have rules that list less-critical things first, and what
I'm trying to do is something like this:

Rules for printers:
Send emails, pages when printers go offline, and recover
STOP HERE, because I do not need other rules to apply (like
"connectivity" below)


Rules for "connectivity" for anything:
Send emails, pages.


What is happening is this:

When the device (printer) went offline, I got alerted.    Yay!   I then
got alerted exactly two hours later.   Exactly what I want.  Yay again!

Then, the printer recovered.   I got *two* emails and *two* pages,
because presumably the printers rule *AND* the "connectivity" rule
applied, even though I only want the one rule to apply.   

Am I missing some intended behavior, or is the "recovered" flag ignoring
how I want my rules to "stop" when certain conditions are met?

My notifications log:

Printer goes offline:
======================
Tue Jul 25 16:50:29 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[158] 1153864229 500
Tue Jul 25 16:50:29 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[159] 1153864229 500

Same message sent two hours later(yay!)
========================================
Tue Jul 25 18:50:32 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[158] 1153871432 500
Tue Jul 25 18:50:32 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[159] 1153871432 500


*DOUBLE* messages sent when recovered state occurs...???
===============================================
Tue Jul 25 19:49:21 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[158] 1153874961 500 10792
Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[159] 1153874961 500 10792
Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[175] 1153874961 500 10792
Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[176] 1153874961 500 10792



My ruleset for alerts:

# These rules change defaults for printers warnings/alerts (only email
or 
# page every 2 hours)
# use of IGNORE rule means NO OTHER RULE matches printers after this
rule....

HOST=%^pr.*mcw\.edu
        MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
        MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS
        IGNORE

HOST=%^hp.*mcw\.edu
        MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
        MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS        
	  IGNORE

HOST=%^lp.*mcw\.edu
        MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
        MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS        
	  IGNORE

# Anything that loses connectivity, email/page every 30 minutes
SERVICE=conn
        MAIL sysadmins REPEAT=30 RECOVERED FORMAT=TEXT
        MAIL pagers REPEAT=30 RECOVERED FORMAT=SMS





More information about the Xymon mailing list