alert storm / intelligent extra mailscript
Martin Flemming
martin.flemming at desy.de
Fri Apr 17 11:16:17 CEST 2009
Hi !
I've got an problem with my colleagues and the alert-storm
if a hole batchfarm will be rebooted for kernel-upgrade etc.
.. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime,
don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before
http://www.hswn.dk/hobbiton/2009/01/msg00398.html
Re: [hobbit] remote/commandline Acknowledge Alerts
and Henrik answered quite right like anytime :-)
but this works only, if i know the id of the event,
in our situation i needed it before the event(s) started .. :-(
they don't want to got 5 or more mails for only one machine
( by ca. 50 or more machines) ...
So, we've played somthing around with Duration,Recovered ..
Now i've got two mails for Conn ( RED & Recovered)
and one for cpu ( Yellow for reboot) ... we can reduce them to only two
mails of course ( deactivate the Recovered for Conn or make an higher
Duration for the cpu-reboot-mail) ...
My Question is, if there still exist an intelligent extra mailscript or
something else which look at the conn-condition and if it's bad, it doesn't send any
alarm for all other services only for conn ....
Thanks & cheers
Martin
More information about the Xymon
mailing list