Some thoughts about alerts, acks and escalations

Henrik Stoerner henrik at hswn.dk
Wed Apr 13 07:49:07 CEST 2005


I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).

BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done. 

Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards,
Henrik



More information about the Xymon mailing list