[hobbit] grouping methods
joe at tmsusa.com
Mon Jun 16 19:45:40 CEST 2008
Josh Luthman wrote:
> Not sure what the real reasoning is behind this but if you have 1000
> servers monitored behind 3 hobbit servers each, figure one Hobbit
> server goes down you lost 1000/3000 being monitored. If you have 3000
> servers being monitored behind 1 hobbit server, that one point of
> failure leaves you blind of all 3000 servers.
We do it with redundancy. Each server in our various data centers is
monitored by two bb servers, with one of the two set up to send
notifications, but in all other aspects the monitoring is active/active,
and we get only one notification for alerts, rather than a pair of
We've not had a bb server go down in all the years we've been using it,
but sometimes wan connectivity goes away due to circumstances beyond our
control, and a bb server in Arizona can't talk to the corresponding bb
server in California, so the normally passive monitoring server goes
into failover mode, and begins sending notification for alerts, since it
can't verify that the other bb server is alive.
Thus, we always receive notifications for all alerts, and in the worst
case we may get redundant notifications in the case of a split brain
situation, which is the lesser of the evils.
Once this notification failover capability makes it into hobbit, we can
finally switch from bb to hobbit.
More information about the Xymon