[hobbit] Windows Cluster Monitoring Advice
AaronZink at eharmony.com
Fri Jun 13 02:05:48 CEST 2008
I too am using bbwin extensively to monitor our windows environment, and we have several clusters. In a windows cluster, monitoring some services per-client will work (especially for tcp monitors), but it is not an ideal solution for several reasons:
1. Active-passive clusters have services and ports that will be running on one node but not the other, making these impossible to monitor. Bbcombotest can sort of be used, but it does not work very well for this.
2. I have yet to get the file checks to work, but checking a file on a shared drive wouldn't work
3. externals.exe to monitor the cluster is nice but there are times when the cluster is "fine" according to cluster manager, but a shared disk is not accessible.
I had an idea to monitor clusters, and was wondering about the feasibility: Re-add the HOSTNAME configuration entry into the bbwin.cfg file, and run two instances of bbwin.exe on the client. One would be the default (reading the hostname from the machine), and the other would be manually configured in the .cfg to the cluster name. This is currently not possible because the hostname can only be overridden in the registry, where both bbwin instances reference.
I don't have a development environment set up to test this myself, but in theory it should work.
Corporate IT Manager
- Aaron Zink
From: Lennon, Padraig [mailto:Padraig.Lennon at pioneerinvestments.com]
Sent: Thursday, June 12, 2008 03:50
To: hobbit at hswn.dk
Subject: RE: [hobbit] Windows Cluster Monitoring Advice
Senior Systems Engineer
Pioneer Global Investments (Dublin)
5th Floor Georges Quay Plaza, Dublin 2
Direct dial: 00353 1 480 2081
From: Etienne Grignon [mailto:etienne.grignon at gmail.com]
Sent: 11 June 2008 10:09
To: hobbit at hswn.dk
Subject: Re: [hobbit] Windows Cluster Monitoring Advice
2008/6/5 Lennon, Padraig <Padraig.Lennon at pioneerinvestments.com>:
> Thanks Etienne,
> I have implemented those changes.. All look good. How do you deal with
> log errors? I was thinking a combo test would work..
> A few other issues:
> Say I wanted to monitor a shared disk F: on the cluster. The shared
> 1TB in size. For the moment the disk is on node1 of the cluster. Now I
> to alert only when the disk gets to 50gb left. This is easy to do in
> bbwin.cfg file on node1.
> Suppose now we have a failover of the resource to node2. It has no
> about the 50gb limit and back on node1 it is in an alert status
> can't find the F: drive.
> How do I get around this?
You will have to comment the line in bbwin.cfg on node 2 until the second node becomes the active node. It is a manual action which is not a good idea I know but there are no other alternatives for that.
However, you can try to remove the specific F rule and change the default rules in % to be sure you will always have 50g left on your F:
drive so you won't get alerts on the second node because the F: drive is missing.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
More information about the Xymon