[hobbit] grouping methods

Sloan joe at tmsusa.com
Mon Jun 16 19:45:40 CEST 2008

Josh Luthman wrote:
> Not sure what the real reasoning is behind this but if you have 1000
> servers monitored behind 3 hobbit servers each, figure one Hobbit
> server goes down you lost 1000/3000 being monitored.  If you have 3000
> servers being monitored behind 1 hobbit server, that one point of
> failure leaves you blind of all 3000 servers.

We do it with redundancy. Each server in our various data centers is 
monitored by two bb servers, with one of the two set up to send 
notifications, but in all other aspects the monitoring is active/active, 
and we get only one notification for alerts, rather than a pair of 
redundant notifications.

We've not had a bb server go down in all the years we've been using it, 
but sometimes wan connectivity goes away due to circumstances beyond our 
control, and a bb server in Arizona can't talk to the corresponding bb 
server in California, so the normally passive monitoring server goes 
into failover mode, and begins sending notification for alerts, since it 
can't verify that the other bb server is alive.

Thus, we always receive notifications for all alerts, and in the worst 
case we may get redundant notifications in the case of a split brain 
situation, which is the lesser of the evils.

Once this notification failover capability makes it into hobbit, we can 
finally switch from bb to hobbit.


More information about the Xymon mailing list