[hobbit] Escalated alerts - necessary ?

brent.mccrackin at bell.ca brent.mccrackin at bell.ca
Mon Feb 14 17:50:59 CET 2005


It's a requirement as far as my setup is concerned.

We have the regular first-line pagers that get the alerts and can
acknowledge as they start working on the issue.  Some hosts and services
are of higher priority within the company, and if they're down for an
extended period of time, upper management wants to get a notification
regardless of the ACK status.  Its usually expected that the person with
the pager has already done this before BB sends an alert to management,
but when troubleshooting a difficult problem time can fly quite quickly.

The management can then go to the webpage, view the acked alert and see
who acked it (based on the ID code) and contact them for a status
report.  This saves management the hassle of contacting all the pagers
to find out who is working on a downed service, because they've been
identified by the ACK code pager ID.  With several different pagers in
the group, management either needs to know which pager a particular host
and service alert is sent to, or otherwise be able to easily identify
who got paged and is working on the issue.

I also use an extension to the bb-ack page which lists all current
alerts with their codes and which pager address or phone number the
alert was sent to.  If an alert hasn't been acked, management can go
there and see who should have been responding to the issue.  This has
also been a handy place to send ACKs to multiple alerts at once.

The addition of the INFO column in BBGEN nicely displays what pagers a
host and its services will send to, and this has also been a help to
management.

---
Brent B McCrackin
UNIX Systems Specialist - Bell Sympatico
Brent.McCrackin at Bell.ca   PH: 416-353-0692
"Serenity through viciousness."
 


-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk] 
Sent: February 14, 2005 11:37 AM
To: hobbit at hswn.dk
Subject: [hobbit] Escalated alerts - necessary ?


I changed the subject, because this is a somewhat different issue that
the rest of the mail Brent wrote:


On Mon, Feb 14, 2005 at 09:39:52AM -0500, brent.mccrackin at bell.ca wrote:

> A feature I'd like to see is the ability to allow an identified
> acknowledge of an alert based on the two-digit code, that stops alerts
> for all recipients except escalation recipients (those being the
people
> that need to be alerted if a downed service is not fixed after a
> specific time period regardless of someone working on it).  This would
> do away with the need for a '99' acknowledge to stop alerts for
> everyone, and let the person responding to the alert work on fixing it
> faster (at least until the escalation person starts asking for status
> reports).


Hobbit does not have the concept of "escalating" an alert that BB
has.

I didn't fully understand what the BB's idea of "escalating" an alert
meant, until I read Brent's message. I see that it could be useful,
but also that it will be somewhat tricky to implement with the current
design of Hobbit's alert-module.

So - how much do you use it ? Do you need to have alerts going out for
problems that have been acknowledged ?


Regards,
Henrik

To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk






More information about the Xymon mailing list