[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [hobbit] Windows Cluster Monitoring Advice

To: "hobbit (at) hswn.dk" <hobbit (at) hswn.dk>
Subject: RE: [hobbit] Windows Cluster Monitoring Advice
From: Aaron Zink <AaronZink (at) eharmony.com>
Date: Mon, 7 Jul 2008 17:43:35 -0700
Accept-language: en-US
Acceptlanguage: en-US
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=eharmony.com; i=AaronZink (at) eharmony.com; q=dns/txt; s=corp; t=1215477817; x=1247013817; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Aaron=20Zink=20<AaronZink (at) eharmony.com>|Subject: =20RE:=20[hobbit]=20Windows=20Cluster=20Monitoring=20Advi ce|Date:=20Mon,=207=20Jul=202008=2017:43:35=20-0700 |Message-ID:=20<1C7B7B9A7479234582BB043380014B9A131DFCAA1 1 (at) EXCHANGE-VS1.eharmony.com>|To:=20"hobbit (at) hswn.dk"=20<ho bbit (at) hswn.dk>|MIME-Version:=201.0 |Content-Transfer-Encoding:=20quoted-printable |In-Reply-To:=20<1C7B7B9A7479234582BB043380014B9A131DE022 18 (at) EXCHANGE-VS1.eharmony.com>|References:=20<9F4B8A70C3E2 D142AF209DB53DBC1D5101D8FB30 (at) MUN2WSP05011.global.pioneer. com>=0D=0A=20<68e737a10806031421j786acd50vd4379bab8c52738 3 (at) mail.gmail.com>=0D=0A=20<9F4B8A70C3E2D142AF209DB53DBC1D 5101DD30F4 (at) MUN2WSP05011.global.pioneer.com>=0D=0A=20<68e7 37a10806110209x2812c36bua0a8cc5c40c012a9 (at) mail.gmail.com> =0D=0A=20<9F4B8A70C3E2D142AF209DB53DBC1D51020138CB (at) MUN2WS P05011.global.pioneer.com>=0D=0A=20<1C7B7B9A7479234582BB0 43380014B9A131DE02218 (at) EXCHANGE-VS1.eharmony.com>; bh=z59sGCkwuytvrWmH3ALNPEiATBAcmuRbOS/BsIoPPJo=; b=JE+gBdk2vSz7C2PB86gAz4OiXSwe7eofM0PuYJSc6MGyEyh9i2ESTFXR GiQS+Es0KTpnfkCLlmJRRzIyVZno6y3UiZFvVSE2ZtCkjNgyesDL45tWK 5fpIjX+2wPqsF7C;
Domainkey-signature: s=corp; d=eharmony.com; c=nofws; q=dns; h=X-IronPort-AV:Received:Received:From:To:Date:Subject: Thread-Topic:Thread-Index:Message-ID:References: In-Reply-To:Accept-Language:Content-Language: X-MS-Has-Attach:X-MS-TNEF-Correlator:acceptlanguage: Content-Type:Content-Transfer-Encoding:MIME-Version; b=3Y0wocw6JFhp9lHTlS/zncB/IseGpYKMYiD++APvldgzM4hJJPXlK9zk ZBBkerFb9m5lgKYwI5s9M6reKK07eUcyjPcRhhqZXwusVba6EUtxeT2SO PA/AcNUOls2V/9v;
References: <9F4B8A70C3E2D142AF209DB53DBC1D5101D8FB30 (at) MUN2WSP05011.global.pioneer.com> <68e737a10806031421j786acd50vd4379bab8c527383 (at) mail.gmail.com> <9F4B8A70C3E2D142AF209DB53DBC1D5101DD30F4 (at) MUN2WSP05011.global.pioneer.com> <68e737a10806110209x2812c36bua0a8cc5c40c012a9 (at) mail.gmail.com> <9F4B8A70C3E2D142AF209DB53DBC1D51020138CB (at) MUN2WSP05011.global.pioneer.com> <1C7B7B9A7479234582BB043380014B9A131DE02218 (at) EXCHANGE-VS1.eharmony.com>
Thread-index: AcjLox/cQICtlIUmR0CIyXBxUAuUUgAEmcqwAEznK2AE6oBo0A==
Thread-topic: [hobbit] Windows Cluster Monitoring Advice

Has anyone had any thoughts on this?  It is really the only thing lacking in our Windows monitoring environment.

Simply re-introducing the optional hostname directive in bbwin and running two instances on each host seems like it would work.  Bbwin would first check the .cfg, then the registry, then default to the machine hostname.


Aaron Zink
Manager, Corporate IT
eHarmony.com
626.795.4814

-----Original Message-----
From: Aaron Zink [mailto:AaronZink (at) eharmony.com]
Sent: Thursday, June 12, 2008 17:06
To: hobbit (at) hswn.dk
Subject: RE: RE: [hobbit] Windows Cluster Monitoring Advice

Hello,

I too am using bbwin extensively to monitor our windows environment, and we have several clusters.  In a windows cluster, monitoring some services per-client will work (especially for tcp monitors), but it is not an ideal solution for several reasons:

1. Active-passive clusters have services and ports that will be running on one node but not the other, making these impossible to monitor.  Bbcombotest can sort of be used, but it does not work very well for this.

2. Checking a file on a shared drive wouldn't work.

3. externals.exe to monitor the cluster is nice but there are times when the cluster is "fine" according to cluster manager, but a shared disk is not accessible.

I had an idea to monitor clusters, and was wondering about the feasibility:  Re-add the HOSTNAME configuration entry into the bbwin.cfg file, and run two instances of bbwin.exe on the client.  One would be the default (reading the hostname from the machine), and the other would be manually configured in the .cfg to the cluster name.  This is currently not possible because the hostname can only be overridden in the registry, where both bbwin instances reference.

I don't have a development environment set up to test this myself, but in theory it should work.


Aaron Zink
Corporate IT Manager
eHarmony.com
626.795.4814


- Aaron Zink


-----Original Message-----
From: Lennon, Padraig [mailto:Padraig.Lennon (at) pioneerinvestments.com]
Sent: Thursday, June 12, 2008 03:50
To: hobbit (at) hswn.dk
Subject: RE: [hobbit] Windows Cluster Monitoring Advice

Thanks Etienne,




Padraig Lennon
Senior Systems Engineer
Production Services
Pioneer Global Investments (Dublin)
5th Floor Georges Quay Plaza, Dublin 2
ext: 2081
Direct dial: 00353 1 480 2081

-----Original Message-----
From: Etienne Grignon [mailto:etienne.grignon (at) gmail.com]
Sent: 11 June 2008 10:09
To: hobbit (at) hswn.dk
Subject: Re: [hobbit] Windows Cluster Monitoring Advice

Hi Padraig,

2008/6/5 Lennon, Padraig <Padraig.Lennon (at) pioneerinvestments.com>:
> Thanks Etienne,
>
> I have implemented those changes.. All look good. How do you deal with
event
> log errors? I was thinking a combo test would work..
>
> A few other issues:
>
> Say I wanted to monitor a shared disk F: on the cluster. The shared
drive is
> 1TB in size. For the moment the disk is on node1 of the cluster. Now I
want
> to alert only when the disk gets to 50gb left. This is easy to do in
the
> bbwin.cfg file on node1.
>
> Suppose now we have a failover of the resource to node2. It has no
idea
> about the 50gb limit and back on node1 it is in an alert status
because it
> can't find the F: drive.
>
> How do I get around this?
>

You will have to comment the line in bbwin.cfg on node 2 until the second node becomes the active node. It is a manual action which is not a good idea I know but there are no other alternatives for that.
However, you can try to remove the specific F rule and change the default rules in % to be sure you will always have 50g left on your F:
drive so you won't get alerts on the second node because the F: drive is missing.

Regards,


--
Etienne GRIGNON

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk




To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk

Follow-Ups:
- Displaying the uptime of non windows machines
  - From: Heinelt Maik

Prev by Date: Re: [hobbit] dynamic hosts
Next by Date: Displaying the uptime of non windows machines
Previous by thread: Re: [hobbit] Custom web pages based on user?
Next by thread: Displaying the uptime of non windows machines
Index(es):
- Date
- Thread