[hobbit] only alert if X number of hosts are already in error
Henrik Stoerner
henrik at hswn.dk
Fri Jun 17 08:01:36 CEST 2005
On Thu, Jun 16, 2005 at 02:28:53PM -0700, Bruce Lysik wrote:
>
> > My best suggestion would be to use the bbcombotest tool to define
> > a pseudo "host" with the combined status of your host "pool".
> >
> > E.g. if you're monitoring http on 5 hosts, you could define a
> > combination test like this:
> >
> > Pool1.http=(hostA.http+hostB.http+hostC.http+hostD.http+hostE.http)>3
> >
> > That would give you a red alert if 3 or fewer hosts in the pool were
> > green. And you could then trigger an alert based on that test result.
>
> Pretty unwieldy when you have large pools of servers, however.
Could be, yes.
> I just started writing a smart paging script which will keep track of
> downed hosts and decide whether or not to page.
I'm interested to know if this kind of alerting is generally useful.
I suspect it might be ... if so, then we should devise a way of defining
such alerts directly in Hobbit instead of forcing you to come up with
scripts that work around this.
Perhaps one solution could be to implement a new kind of rule for the
hobbit-alerts file. Currently all of the rules are matched against a
specific host+test combination; we could define a type of rule that
could be matched against all of the host+test statuses that are in an
alerting stage, and then have the rule trigger based on some criteria
for how many matches we get.
Something like
HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5
MAIL someone at foo.com
The "COUNT>=5" would then cause this rule to trigger only if there
were 5 or more hosts named www.*.foo.com, whose http tests are red.
You could even combine this with other criteria, say have a threshold of
5 during the daytime, and 10 during off-hours.
I can foresee a problem in handling recovery-notifications for this kind
of alerts, but that's something I'll have to think about.
Would that be useful ?
> One question I have so far is: Does hobbit wait for an alerting script
> to return before continuing to evaluate other rules?
Paging scripts are serialized, yes - Hobbit will wait for a paging
script to complete before continuing down the list of alert rules.
Regards,
Henrik
More information about the Xymon
mailing list