[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grouping methods



Looking for some thoughts and experiences on how folks have configured their systems. Mainly in regard to classification/grouping of servers for alerting purposes. I'll try to keep this short.

Currently I'm running a total if 3 hobbit servers, each in a different data center. Each server monitors clients local to his network in addition to each of his partner servers smtp box, etc. This all works fine. However, our alerting system, which also works fine is overly complex and contains too many opportunities for bugs.

In a nutshell, we have 3 groups of sysadmins that rotate on call every nn interval. Each group may be involved with a number of systems in each location and some of the admins will work on multiple Operating Systems.

I'm looking for a way to avoid having specific alert rules for each server (lots of text, even with regex macros/vars). More to the point, I want to categorize the servers based on a sysadmin group then the rules can be considerably less complex.
Dividing the alerting on OS categories does not work well as some of the admins are cross platform folks.
Dividing the alerting by page does not work well as the same 'page' may contain servers belonging to one or more sysadmin groups. The 'Class' statement for bb-hosts seems like a possibility, however I think the intended purpose is more related to whatever logs are defined in client-local, so I don't think that will work beyond log files.

Ideally I'd like to define the sysadmin group in the bb-hosts file but I don't think this is possible.

In summary, if I maintain immense configuration files with somewhat repetitive data Hobbit works quite well. I'd like to reduce the complexity but maintain the functionality. Maybe its not in the cards, or maybe - and I am hoping this is the case - I missed some cool flag or config setting.

Thoughts?