[Xymon] Yellow->red escalation, bug or feature?

Mark Hinkle hinkman at hinkman.com
Wed Jan 11 21:16:24 CET 2012


> I think there's a counter that isn't reset.

Just guessing, but I would say you are close. Seems more like there is a counter missing. As mentioned in the old discussion included in a previous email, there is a single alert duration clock when there really needs to be both yellow and red clocks. Alert state issue again, maybe? See my comments at the bottom about another long-standing "lack of alert state" issue.

One possible non-pretty, non-scalable work-around for your issue would be to create a "red" test, i.e. diskred, that only has red-level thresholds and alerts config, and take the red alerts config off of the non-red test (but leave the red threshold). This would give you the correct red duration for your red-level paging alerts. You could use bb-hosts tricks like NOPROPRED, etc. to not show this "red" test on the web pages if you didn't want to. The non-red test would still go yellow and red so you would see it on the web, it just wouldn't be doing the red paging. Like I said, not pretty, but possibly better than the false positives you are getting. Possibly.

If the powers-that-be are willing to open the question of "alert state", then please, please also look into the long standing recovery message issue. Specifically, if you are emailing on yellow and paging on red, a test that goes green->yellow->red->yellow->green will result in a red page but only an email recovery. See http://lists.xymon.com/archive/2008-July/020107.html and http://lists.xymon.com/archive/2008-July/020152.html. Apologies if this seems like a thread hijack, that is not the intent at all, but rather these issues seem very closely related with respect to maintaining alert state and to what degree.

-- 
Mark L. Hinkle
hinkman at hinkman.com




More information about the Xymon mailing list