alert storm / intelligent extra mailscript

Martin Flemming martin.flemming at desy.de
Fri Apr 17 11:16:17 CEST 2009


Hi !

I've got an problem with my colleagues and the alert-storm
if a hole batchfarm will be rebooted for kernel-upgrade etc.
  .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime,
don't ask me why ... he hate web-guis, want to make only one command on the console ...

I know, i asked something similiar before
http://www.hswn.dk/hobbiton/2009/01/msg00398.html
Re: [hobbit] remote/commandline Acknowledge Alerts

and Henrik answered quite right like anytime :-)
but this works only, if i know the id of the event,
in our situation i needed it before the event(s) started .. :-(


they don't want to got 5 or more mails for only one machine
( by ca. 50 or more machines) ...

So, we've played somthing around with Duration,Recovered ..

Now i've got two mails for Conn ( RED & Recovered)
and one for cpu ( Yellow for reboot) ... we can reduce them to only two 
mails of course ( deactivate the Recovered for Conn or make an higher 
Duration for the cpu-reboot-mail) ...

My Question is, if there still exist an intelligent extra mailscript or 
something else which look at the conn-condition and if it's bad, it doesn't send any 
alarm for all other services only for conn ....

Thanks & cheers


        Martin



More information about the Xymon mailing list