[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] EXHOST usage
Henrik Stoerner wrote:
On Wed, Oct 26, 2005 at 04:12:00AM -0700, Charles Jones wrote:
Perhaps it's because I'm working on this at 4am, but I'm having a
problem with the EXHOST option, that according to hobbitd_alert --test
isn't working, I also am not sure how to do a particular host/service
exclusion.
Heres basically what my below alert config is meant to accomplish.
1. For any alerts on any servers, send alerts to an alert email address.
2. For 2 particular web servers (web5.mydomain.com and
web6.mydomain.com), send an alert to one person, but *not *the alert alias.
3. For a set of oracle servers, send an extra alert message to an
alternate email address/cellphone.
One way of doing these would be:
# 2 special webservers, that ONLY get this alert (2)
HOST=$WEB_SERVERS SERVICE=msgs COLOR=red
MAIL webdev (at) mydomain.com STOP
# Oracle alerts (3)
HOST=$ORACLE_SERVERS SERVICE=msgs,oradb,orasys COLOR=red FORMAT=sms
MAIL dbacell (at) cellphone.com
# Default rule (1)
HOST=$ALL_HOSTS SERVICE=* COLOR=red
MAIL alert (at) mydomain.com
4. After hours (from 5pm until 8am), only send alerts to an alternate
email address (but still need the seperate alert for the web5 and web6
hosts described in #2).
5. After hours (from 5pm until 8am), send an alert to my cellphone for
any hosts and services being red for more than 30 mins.
For these, modify the default rule marked (1) to use different alerts
based on time. E.g.
# Default rule (1)
HOST=$ALL_HOSTS SERVICE=* COLOR=red
MAIL alert (at) mydomain.com TIME=*:0800:1700
# Outside office hours, mail alerts to a different address (4)
MAIL alternate (at) mydomain.com TIME=*:1700:0800
# Outside office hours, send to my cell phone (5)
MAIL mycell (at) cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
Ahh! I didn't realize you could make multiple TIME
specifications...that's the main thing I was missing.
6. Do not alert for high load average on a particular server from 6-10am.
There's no really elegant way of doing that ... it makes me think that
perhaps there should be some way of defining a "no-action" rule: "For
these conditions, do NOT send any alerts, and stop looking for more
alert recipients".
That would be nice, I hereby dub it, the BLACKHOLE option ;-)
But for now, you'll have to modify the default rule
to exclude that host, then setup specific rules for that host. So your
default rule becomes
# Default rule (1)
HOST=$ALL_HOSTS SERVICE=* COLOR=red EXHOST=dataproc1.mydomain.com
MAIL alert (at) mydomain.com TIME=*:0800:1700
# Outside office hours, mail alerts to a different address (4)
MAIL alternate (at) mydomain.com TIME=*:1700:0800
# Outside office hours, send to my cell phone (5)
MAIL mycell (at) cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
and the specific rules for that host:
# Load avg alerts only from 10am -> 6am
HOST=dataproc1.mydomain.com SERVICE=la TIME=*:1000:0600
MAIL alert (at) mydomain.com TIME=*:0800:1700
MAIL alternate (at) mydomain.com TIME=*:1700:0800
MAIL mycell (at) cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
# All other services alert like the normal default rule.
HOST=dataproc1.mydomain.com EXSERVICE=la
MAIL alert (at) mydomain.com TIME=*:0800:1700
MAIL alternate (at) mydomain.com TIME=*:1700:0800
MAIL mycell (at) cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
This has me a bit confused. The default rule I understand, as it's the
normal rule except its excluding the dataproc1 host. The specific rules
though, the first one, has a TIME specification in the HOST= line,
indicating from 6am-10am, but then the MAIL lines following it specify
times outside that window...is that basically a way to trick hobbit into
not sending a mail at all?
Note: the way I handle this in BigBrother is via an exclude rule,
basically when you define a rule with a ! in front of it, it removes
that host/service from the FINAL match list. Hopefully you can
implement something in Hobbit for a similar effect.
# Dont wake OnCall person every morning about dataproc1 cpu/load being high
!dataproc1.mydomain.com;;cpu;;*;0600-1000;alert (at) mydomain.com
I also use the same technique on BigBrother to remove alerts during
certain hours:
# Don't send alerts about web errors during non-working hours.
!web*.mydomain.com;;msgs;;*;0000-0800;alert (at) mydomain..com
!web*.mydomain.com;;msgs;;*;1700-0000;alert (at) mydomain.com
-Charles