[hobbit] hobbit-alerts.cfg - DURATION

Johann Eggers Johann.Eggers at teleatlas.com
Thu Mar 15 10:03:58 CET 2007


> -----Original Message-----
> From: Cortes, Manny [mailto:Manuel.Cortes at ORHS.ORG]
> Sent: Mittwoch, 14. März 2007 23:30
> To: hobbit at hswn.dk
> Subject: RE: [hobbit] hobbit-alerts.cfg - DURATION
> 
> We use DURATION in our case as a way to escalate notifications to another
> group of recipients 15 minutes after the initial event occurred in hobbit.
> The initial alert goes to our onsite Operations folks then after 15
> minutes, a custom script fires off that informs all in that particular
> recipient group that the event is still ongoing and it is being escalated.
> 
>     so: qpage pages Operations as soon as the event occurs and they
> monitor the event
>          DURATION>15: the second script fires off.
> 
> Working pretty well so far....
> 
> Could REPEAT be used for further escalation? Or will another DURATION>30
> suffice?
> 

This is from the hobbit-alerts.cfg man-page:

Rule matcing an alert if the event has lasted longer/shorter than the given duration. E.g. DURATION>1h (lasted longer than 1 hour) or DURATION<30 (only sends alerts the first 30 minutes).

That's exactly the way we are using the DURATION tag.
We've specified on most of the alert rules a DURATION>5 because often a test fails and becomes back green after e.g. 2 minutes. So in this case we don't want to get alarmed by mail or SMS.
If the RED condition is still valid after more than 5 minutes then send out an alarm.
We also use the REPEAT tag, based on the importance of the systems, resend the alarm to make the appropiate people aware that the problem is still not fixed.

Regards
Johann



More information about the Xymon mailing list