[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] grouping methods



Josh Luthman wrote:
Not sure what the real reasoning is behind this but if you have 1000
servers monitored behind 3 hobbit servers each, figure one Hobbit
server goes down you lost 1000/3000 being monitored.  If you have 3000
servers being monitored behind 1 hobbit server, that one point of
failure leaves you blind of all 3000 servers.

We do it with redundancy. Each server in our various data centers is monitored by two bb servers, with one of the two set up to send notifications, but in all other aspects the monitoring is active/active, and we get only one notification for alerts, rather than a pair of redundant notifications.

We've not had a bb server go down in all the years we've been using it, but sometimes wan connectivity goes away due to circumstances beyond our control, and a bb server in Arizona can't talk to the corresponding bb server in California, so the normally passive monitoring server goes into failover mode, and begins sending notification for alerts, since it can't verify that the other bb server is alive.

Thus, we always receive notifications for all alerts, and in the worst case we may get redundant notifications in the case of a split brain situation, which is the lesser of the evils.

Once this notification failover capability makes it into hobbit, we can finally switch from bb to hobbit.

Joe