[hobbit] only alert if X number of hosts are already in error

Henrik Stoerner henrik at hswn.dk
Fri Jun 17 08:01:36 CEST 2005


On Thu, Jun 16, 2005 at 02:28:53PM -0700, Bruce Lysik wrote:
> 
> > My best suggestion would be to use the bbcombotest tool to define
> > a pseudo "host" with the combined status of your host "pool".
> > 
> > E.g. if you're monitoring http on 5 hosts, you could define a
> > combination test like this:
> > 
> > Pool1.http=(hostA.http+hostB.http+hostC.http+hostD.http+hostE.http)>3
> > 
> > That would give you a red alert if 3 or fewer hosts in the pool were
> > green. And you could then trigger an alert based on that test result.
> 
> Pretty unwieldy when you have large pools of servers, however.  

Could be, yes.

> I just started writing a smart paging script which will keep track of 
> downed hosts and decide whether or not to page.  

I'm interested to know if this kind of alerting is generally useful.
I suspect it might be ... if so, then we should devise a way of defining
such alerts directly in Hobbit instead of forcing you to come up with
scripts that work around this.

Perhaps one solution could be to implement a new kind of rule for the
hobbit-alerts file. Currently all of the rules are matched against a
specific host+test combination; we could define a type of rule that
could be matched against all of the host+test statuses that are in an 
alerting stage, and then have the rule trigger based on some criteria
for how many matches we get.

Something like

   HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5
      MAIL someone at foo.com

The "COUNT>=5" would then cause this rule to trigger only if there
were 5 or more hosts named www.*.foo.com, whose http tests are red.
You could even combine this with other criteria, say have a threshold of
5 during the daytime, and 10 during off-hours.

I can foresee a problem in handling recovery-notifications for this kind
of alerts, but that's something I'll have to think about.

Would that be useful ?


> One question I have so far is: Does hobbit wait for an alerting script 
> to return before continuing to evaluate other rules?  

Paging scripts are serialized, yes - Hobbit will wait for a paging
script to complete before continuing down the list of alert rules.


Regards,
Henrik




More information about the Xymon mailing list