[hobbit] Is there a way to "quietly" disable hosts that have NOTICE set?
Francesco Duranti
fduranti at q8.it
Mon Sep 25 18:55:51 CEST 2006
At this moment it's not possible to do anything like that I think ...
The only 3 alternatives I see at this moment are:
1) writing a script that will change the DOWNTIME inside the bb-hosts
file at the start and at the end of the backup, hobbit should read it
for the next test and not send alert or notice
2) You can enable the NOTICE alert only during the non backup time so
for example if you do backup between 02:00-05:00 you can enable NOTICE
only from 0-2 and from 5-24 and sending a disable via bb command at the
start of the backup. With this you'll not know if someone will disable
the text during the backup hour but at least you get the alert if the
host is down (better then having hobbit not alert on a fault probably).
3) At least for the "oracle" test if it's local to the oracle database
you can also change it to check for example for the existance of a file
named something like /oracle/backup_now and if that file exist change
the script to not do any checks and send a clear state to hobbit. For
the procs to do the same I think you've to modify the hobbit client too
and I don't know how simple it can be.
Francesco
________________________________
From: Charles Jones [mailto:jonescr at cisco.com]
Sent: Monday, September 25, 2006 6:28 PM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Is there a way to "quietly" disable hosts
that have NOTICE set?
Okay, let me explain this again :)
If I ignore pages for a specific time interval, say 2 hours
while backups are being done, then every minute that the backups
complete before that interval is over, is a minute that the database is
essentially not monitored, which is not acceptable.
Example:
via DOWNTIME in bb-hosts, or via TIME specifications in
hobbit-alerts.cfg as you mention,
[----------window where no pages are sent for red status of
"oracle" or "procs"------------]
[-----------------database backups--------------][------
something can break here -------]
My problem is that something can break during that post-backup
interval, and nobody will be notified. Making the interval shorter is
not an option because due to the nature of backups, the amount of time
it takes to do the backup fluctuates on a daily basis. It could take 30
minutes monday night, an hour tuesday night, 10 minutes wednesday night,
73 minutes thurs.. Using the bb disable command from a script is not a
good solution because that makes NOTICE alerts get sent out, which wakes
people up for no reason.
The DOWNTIME option in bb-hosts has the ability to set the
status of services blue without sending NOTICE alerts. I would like this
same ability on an interactive basis (via a message to hobbit using the
bb command).
-Charles
Scheblein, Adam wrote:
we have put in the following because every night we have
batch jobs that run that use almost all the processor power in our
server:
SERVICE=cpu
MAIL [e-mail address] COLOR=red DURATION=15
TIME=*:0900:1700 HOST=[hostname]
by changing the TIME field to suit you, you will not get
paiged, you just have to put an EXHOST=[hostname] in your normal alert
rules.
Adam
________________________________________
From: Charles Jones
Sent: Monday, September 25, 2006 7:12 AM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Is there a way to "quietly"
disable hosts that have NOTICE set?
That sends out NOTICE alerts though. If you are on call,
do you want to
get paged at 3am every night when those services are put
into MAINT
mode? Thats why I asked if there was a way to do a
"silent" disable. I
basically need something that works exactly like the
DOWNTIME option in
bb-hosts, except interactively.
-Charles
Francesco Duranti wrote:
Directly from the man bb command :D
disable HOSTNAME.TESTNAME DURATION
<additional text>
Disables a specific test for
DURATION minutes. This
will cause the status of this test to be listed
as
"blue" on the BBDISPLAY server,
and no alerts for this
host/test will be generated. If DURATION is
given as
a number followed by s/m/h/d, it
is interpreted as being
in seconds/minutes/hours/days respectively. To
disable all tests for a host, use
an asterisk "*" for
TESTNAME.
enable HOSTNAME.TESTNAME
Re-enables a test that had been
disabled.
So you can execute bb hobbitserver "disable
dbservername.* 2h Backup
time" before shutting down the database and bb
hobbitserver "enable
dbserver.*" just at the end.
Francesco
-----Original Message-----
From: Charles Jones
[mailto:jonescr at cisco.com]
Sent: Monday, September 25, 2006 7:56 AM
To: hobbit at hswn.dk
Subject: [hobbit] Is there a way to
"quietly" disable hosts
that have NOTICE set?
I have the NOTICE flag set for all of my
production hosts - I
want pages to go out if someone disables
or enables any of them.
However, when backups are done, the
oracle databases are
brought down, which triggers an alert.
If they are manually
disabled, the NOTICE message goes out
which also wakes people
up for no reason.
If I use DOWNTIME in bb-hosts, then I
have to specify a
window which is guaranteed to be longer
than the possible
time it could take to backup the
databases (which is a
dynamic thing which will surely be wrong
from time to time).
So what ends up happening is for
example, I would specify an
hour of DOWNTIME, but the backups
sometimes only take 30 minutes.
That means there is a 30 minute window
where a real alert
would be masked, which is unacceptable
in a production environment.
I guess what I'm looking for, is a way
that I can send a
commands to Hobbit via a shellscript
(called from the db
backup script), that would put a
host/services in maint mode
(disabled - blue dot), and NOT send a
NOTICE page.
-Charles
To unsubscribe from the hobbit list,
send an e-mail to
hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
More information about the Xymon
mailing list