[hobbit] RECOVERED alerts red->yellow

Hubbard, Greg L greg.hubbard at eds.com
Thu Jul 24 00:27:28 CEST 2008


You might try having a separate rule for each color.  Then maybe the
rule would fire when the test transitions into that color.  It may not
fire when it transitions from one color to another in the same rule.
But I am just guessing!

GLH

-----Original Message-----
From: Alan Sparks [mailto:asparks at doublesparks.net] 
Sent: Wednesday, July 23, 2008 4:40 PM
To: hobbit at hswn.dk
Subject: Re: [hobbit] RECOVERED alerts red->yellow

Anyone have any other ideas how to fix this bug?  Thanks...
-Alan

Alan Sparks wrote:
> After a day of running in trace and debug modes on the alerts module, 
> I think I understand how this is broken.  But I'm unsure anything but 
> hacking the code can fix the issue.  It appears to be unfortunate 
> interactions in some of the features, including the "flap detection"
> stuff.
>
> So: If I have the rule:
> MAIL me at whereever.com TEST=disk COLOR=RED RECOVERED and 
> ALERTCOLORS="red,yellow,purple"
>
> The traces show Hobbit going through the following "thought process":
> * Say the disk goes yellow.  That's in Hobbit's alert color list, so 
> it triggers alert processing.  But, no rule matches that color, so no 
> alert is sent.
> * Say the disk now goes red.  Now, Hobbit sees that as a transition 
> from an alert state to another alert state.  Normally, it would 
> suppress this, but there is logic to special-case going red, and the 
> alert processing is triggered.  This time, a rule matches, and an 
> alert is sent.
> * Say now the disk goes yellow.  This is seen by Hobbit as a 
> transition from an alert state to another alert state (due to both 
> colors in ALERTCOLORS).  No alert processin is done -- it is 
> suppressed since it is NOT a recovery (it's flapping between two alert

> states).  BUT, Hobbit now remembers the current color (alert state) as

> yellow.
> * Finally, the disk goes green.  This is a recovery, since it is a 
> transition from the ALERTCOLORS to the OKCOLORS.  And, this triggers 
> alert rule processing.  HOWEVER, now, the alert code scans for a rule 
> for the last state of the alert -- yellow.  And, of course, no such 
> rule exists, and the rule that would trigger the recovery page is not 
> used, and no recovery page is sent.
>
> The RECOVERED keyword is only a flag on the rule that says if you 
> match this rule during recovery processing, this recip does want a 
> recovery page.  But, Hobbit keeps no memory about which rule triggered

> an alert, it seems.  It has to go back through the ruleset during 
> recovery processing to find a rule to use.  And because the colors 
> change, no such rule can exist.
>
> So I think you can call it a bug, or an unfortunate side effect of 
> adding yellow to the ALERTCOLORS list.  If you do, you'll compromise 
> your recovery paging.  If you don't, you can't send alerts on warning
> (yellow) conditions.  Short of changing the code to eliminate the 
> alert state suppression (i.e., flap detection),
>
> I'm not certain how this can be fixed or worked around.
> -Alan
>
>
> Mark Hinkle wrote:
>> Yes, I see the same thing as Alan and maybe that is why his 
>> description makes sense to me.
>>
>> The real questions are: what triggers a recovery message to be sent 
>> and who gets them? Is it when a test goes from any color to green? Or

>> is it any "down-grade" in alert state (i.e. red->yellow, or
>> yellow->green)? It appears to be the former - any color to green. And
>> that makes sense - "recovery" means everything is ok, and that is 
>> what "green" means.
>>
>> But that does leave an open question about that state change from
>> red->yellow. In my environment, different notification methods are
>> used for "red" than are used for "yellow", specifically sms text for 
>> red vs. emails for yellow.
>>
>> *And that is where the problem comes in*: if a "red" failed test 
>> first goes to "yellow" before then going to "green", the recovery 
>> message (upon going green) is only sent to the notification 
>> destinations configured for the *yellow state*, not the red state.
>>
>> I certainly understand how this logically occurs - red->yellow is not

>> a recovery so nothing would be sent there at all. But hobbit does not

>> seem to save a complete list of who has been notified for each 
>> "event", so it basically forgets about those folks sent notifications

>> at the red level as soon as it transitions to yellow. When the test 
>> finally goes green, hobbit checks the alerts config for who would 
>> have been notified at *the state just before green* (in this case
>> yellow) and sends recovery messages to those destinations. But it has

>> lost the fact that it was actually at a red level previous to the 
>> yellow and should have sent recovery to those destinations as well.
>>
>> I believe that BB keeps track of who has been notified for each event

>> via the "np_user at host.com_host1.disk" type of entries in the tmp dir.
>> This allows it to have a complete list of notification destinations 
>> that it could/can use for recoveries. I am not saying hobbit should 
>> use the same mechanism, but hobbit does *appear* to be losing some 
>> rather important state info.
>>
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to 
> hobbit-unsubscribe at hswn.dk
>
>
>



To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk





More information about the Xymon mailing list