[hobbit] only alert if X number of hosts are already in error

Daniel J McDonald dan.mcdonald at austinenergy.com
Mon Jun 20 15:14:59 CEST 2005


On Fri, 2005-06-17 at 08:01 +0200, Henrik Stoerner wrote:

> Something like
> 
>    HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5
>       MAIL someone at foo.com
> 
> The "COUNT>=5" would then cause this rule to trigger only if there
> were 5 or more hosts named www.*.foo.com, whose http tests are red.
> You could even combine this with other criteria, say have a threshold of
> 5 during the daytime, and 10 during off-hours.
> 
> I can foresee a problem in handling recovery-notifications for this kind
> of alerts, but that's something I'll have to think about.
> 
> Would that be useful ?

The main place I would use it would be NTP alerts.  If one router loses
NTP, I'm not terribly worried.  If 10-20 of them all fail at once then I
know there is something really bad happening... Maybe both GPS clocks
lost sync and all 4 cesium backups failed, or ntp locked up on a core
router and I need to make fewer down-stream nodes dependent on that one.


I would also consider using it for purple alerts.  I don't want
individual purples for most of my stuff, but if there are a lot of them
(>100) then I know I killed mrtg and I should page on that.
-- 
Daniel J McDonald, CCIE # 2495, CNX
Austin Energy

dan.mcdonald at austinenergy.com




More information about the Xymon mailing list