[hobbit] the all or nothing nature of hobbit

Henrik Stoerner henrik at hswn.dk
Thu Dec 7 22:05:47 CET 2006


On Thu, Dec 07, 2006 at 11:59:30AM -0800, Dan Simoes wrote:
> I love hobbit and have been using it (and BB) for many years, so take this
> as constructive criticism.
> 
> One of my biggest headaches with BB (and now hobbit) has been the
> all-or-nothing nature of alerts.
> By this I mean that if your main network link is down, everything goes red
> for network status.
> 
> Something happened on my monitoring box (probably DNS) that caused a cadence
> of http errors.  http was not truly down on all these N hosts on various
> networks, it was the network test that was failing on the monitoring box.

It's a valid point - but it is also very, very difficult to handle. Not
so much because it is difficult to suppress alerts; the $1bn question is
how to decide when to suppress an alert, and which issue is the root
cause of all the problems we're seeing.

Heck, sometimes it can be difficult even for intelligent humans to
figure out what is really going on ...

I think what this really boils down to is some form of event correlation
mechanism, on top of which you then apply some heuristics (that's a
fancy word for "guessing") to decide what is the core issue. E.g. if we
have 200 tests reporting a failure because of a DNS lookup that timed
out, then we probably have an issue with the DNS server we used. But it
could also be a firewall mis-configuration that blocks our outbound DNS
queries, or an IP address conflict that causes our DNS lookups to go to 
a server which doesn't handle DNS - it is really hard for any machine to
figure that out by itself.

The current implementation is not ideal, I'll be the first to admit
that. Any ideas for improving it are welcome, but please consider the
possibilities for the system making wrong decisions. I'd rather send out
one alert too many than one too few.


> I'm unaware of a solution to this issue, and I'm considering moving to
> another product because of it.

If you know of any products that are really good at handling this, I'd
be interested to hear about them.

> Lastly, who is maintaining the debian package for hobbit?  Both the server
> and client packages still have the same bugs I reported months ago.

Since there haven't been any Hobbit releases since August, that really
shouldn't come as a surprise.


Regards,
Henrik




More information about the Xymon mailing list