alert storm / intelligent extra mailscript
    Martin Flemming 
    martin.flemming at desy.de
       
    Fri Apr 17 11:16:17 CEST 2009
    
    
  
Hi !
I've got an problem with my colleagues and the alert-storm
if a hole batchfarm will be rebooted for kernel-upgrade etc.
  .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime,
don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before
http://www.hswn.dk/hobbiton/2009/01/msg00398.html
Re: [hobbit] remote/commandline Acknowledge Alerts
and Henrik answered quite right like anytime :-)
but this works only, if i know the id of the event,
in our situation i needed it before the event(s) started .. :-(
they don't want to got 5 or more mails for only one machine
( by ca. 50 or more machines) ...
So, we've played somthing around with Duration,Recovered ..
Now i've got two mails for Conn ( RED & Recovered)
and one for cpu ( Yellow for reboot) ... we can reduce them to only two 
mails of course ( deactivate the Recovered for Conn or make an higher 
Duration for the cpu-reboot-mail) ...
My Question is, if there still exist an intelligent extra mailscript or 
something else which look at the conn-condition and if it's bad, it doesn't send any 
alarm for all other services only for conn ....
Thanks & cheers
        Martin
    
    
More information about the Xymon
mailing list