RECOVERED flag in hobbit-alerts.cfg question
Brodie, Kent
brodie at mcw.edu
Wed Jul 26 03:09:55 CEST 2006
Hi all!
As I expand my alerting ruleset to do cool things like only page me
about "printers" during the day and so on, I had a weird occurrence
today.
Basically, I have rules that list less-critical things first, and what
I'm trying to do is something like this:
Rules for printers:
Send emails, pages when printers go offline, and recover
STOP HERE, because I do not need other rules to apply (like
"connectivity" below)
Rules for "connectivity" for anything:
Send emails, pages.
What is happening is this:
When the device (printer) went offline, I got alerted. Yay! I then
got alerted exactly two hours later. Exactly what I want. Yay again!
Then, the printer recovered. I got *two* emails and *two* pages,
because presumably the printers rule *AND* the "connectivity" rule
applied, even though I only want the one rule to apply.
Am I missing some intended behavior, or is the "recovered" flag ignoring
how I want my rules to "stop" when certain conditions are met?
My notifications log:
Printer goes offline:
======================
Tue Jul 25 16:50:29 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[158] 1153864229 500
Tue Jul 25 16:50:29 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[159] 1153864229 500
Same message sent two hours later(yay!)
========================================
Tue Jul 25 18:50:32 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[158] 1153871432 500
Tue Jul 25 18:50:32 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[159] 1153871432 500
*DOUBLE* messages sent when recovered state occurs...???
===============================================
Tue Jul 25 19:49:21 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[158] 1153874961 500 10792
Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[159] 1153874961 500 10792
Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
sysadmins[175] 1153874961 500 10792
Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244)
pagers[176] 1153874961 500 10792
My ruleset for alerts:
# These rules change defaults for printers warnings/alerts (only email
or
# page every 2 hours)
# use of IGNORE rule means NO OTHER RULE matches printers after this
rule....
HOST=%^pr.*mcw\.edu
MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS
IGNORE
HOST=%^hp.*mcw\.edu
MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS
IGNORE
HOST=%^lp.*mcw\.edu
MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS
IGNORE
# Anything that loses connectivity, email/page every 30 minutes
SERVICE=conn
MAIL sysadmins REPEAT=30 RECOVERED FORMAT=TEXT
MAIL pagers REPEAT=30 RECOVERED FORMAT=SMS
More information about the Xymon
mailing list