[hobbit] RECOVERED alerts red->yellow

Mark Hinkle hinkman at hinkman.com
Thu Jul 10 01:25:51 CEST 2008


Yes, I see the same thing as Alan and maybe that is why his description 
makes sense to me.

The real questions are: what triggers a recovery message to be sent and 
who gets them? Is it when a test goes from any color to green? Or is it 
any "down-grade" in alert state (i.e. red->yellow, or yellow->green)? It 
appears to be the former - any color to green. And that makes sense - 
"recovery" means everything is ok, and that is what "green" means.

But that does leave an open question about that state change from 
red->yellow. In my environment, different notification methods are used 
for "red" than are used for "yellow", specifically sms text for red vs. 
emails for yellow.

*And that is where the problem comes in*: if a "red" failed test first 
goes to "yellow" before then going to "green", the recovery message 
(upon going green) is only sent to the notification destinations 
configured for the *yellow state*, not the red state.

I certainly understand how this logically occurs - red->yellow is not a 
recovery so nothing would be sent there at all. But hobbit does not seem 
to save a complete list of who has been notified for each "event", so it 
basically forgets about those folks sent notifications at the red level 
as soon as it transitions to yellow. When the test finally goes green, 
hobbit checks the alerts config for who would have been notified at *the 
state just before green* (in this case yellow) and sends recovery 
messages to those destinations. But it has lost the fact that it was 
actually at a red level previous to the yellow and should have sent 
recovery to those destinations as well.

I believe that BB keeps track of who has been notified for each event 
via the "np_user at host.com_host1.disk" type of entries in the tmp dir. 
This allows it to have a complete list of notification destinations that 
it could/can use for recoveries. I am not saying hobbit should use the 
same mechanism, but hobbit does *appear* to be losing some rather 
important state info.

-- 
Mark L. Hinkle
hinkman at hinkman.com




More information about the Xymon mailing list