[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] RECOVERED alerts red->yellow
- To: hobbit (at) hswn.dk
- Subject: Re: [hobbit] RECOVERED alerts red->yellow
- From: Mark Hinkle <hinkman (at) hinkman.com>
- Date: Wed, 09 Jul 2008 16:25:51 -0700
- References: <1F7B01020EC4D04DA17703634B9E888E0561A012 (at) ULPGCTMVMAI003.EU.COLT> <4874B9EF.3030205 (at) doublesparks.net>
- User-agent: Thunderbird 2.0.0.14 (Macintosh/20080421)
Yes, I see the same thing as Alan and maybe that is why his description
makes sense to me.
The real questions are: what triggers a recovery message to be sent and
who gets them? Is it when a test goes from any color to green? Or is it
any "down-grade" in alert state (i.e. red->yellow, or yellow->green)? It
appears to be the former - any color to green. And that makes sense -
"recovery" means everything is ok, and that is what "green" means.
But that does leave an open question about that state change from
red->yellow. In my environment, different notification methods are used
for "red" than are used for "yellow", specifically sms text for red vs.
emails for yellow.
*And that is where the problem comes in*: if a "red" failed test first
goes to "yellow" before then going to "green", the recovery message
(upon going green) is only sent to the notification destinations
configured for the *yellow state*, not the red state.
I certainly understand how this logically occurs - red->yellow is not a
recovery so nothing would be sent there at all. But hobbit does not seem
to save a complete list of who has been notified for each "event", so it
basically forgets about those folks sent notifications at the red level
as soon as it transitions to yellow. When the test finally goes green,
hobbit checks the alerts config for who would have been notified at *the
state just before green* (in this case yellow) and sends recovery
messages to those destinations. But it has lost the fact that it was
actually at a red level previous to the yellow and should have sent
recovery to those destinations as well.
I believe that BB keeps track of who has been notified for each event
via the "np_user (at) host.com_host1.disk" type of entries in the tmp dir.
This allows it to have a complete list of notification destinations that
it could/can use for recoveries. I am not saying hobbit should use the
same mechanism, but hobbit does *appear* to be losing some rather
important state info.
--
Mark L. Hinkle
hinkman (at) hinkman.com