[hobbit] DNS reboot causes purples

Henrik Stoerner henrik at hswn.dk
Thu Jan 12 07:40:05 CET 2006

On Tue, Jan 10, 2006 at 10:58:10AM -0500, Bill Perez wrote:
> I'm hoping someone might be able to help me.  I'm running Hobbit 4.1.2 on a
> Fedora Core 4, monitoring approximately 500 servers.  I have been running
> Hobbit for a few months and a few times our DNS server has been rebooted for
> patching.  When this happens it causes some servers to go purple and the
> only way I've been able to fix this is to restart the Hobbit service but it
> has generated a ton of alerts and not a lot of happy alert recipients.  My
> /etc/resolv.conf file has primary and secondary DNS servers, so I would have
> thought if one wasn't available it would use the other, but this doesn't
> seem to be the case.

Which tests are going purple ? The network tests (conn, smtp, http etc.)
or the client-side tests (cpu, disk, memory ...) ?

If it's the network tests, then the problem is probably that Hobbit is
timing out the DNS requests because it takes too long to do the DNS
lookups. It probably sends the query first to the server which is down,
and then times out waiting for the response. But that would normally
cause your network tests to go red - with a DNS error status - not
purple. But setting up a caching DNS server on the Hobbit server might
help with that (and is generally a good idea when testing many servers).

So I think it's your client-side tests that go purple. Which doesn't
really make sense, since the only communication between the clients and
Hobbit normally use the IP address directly. But you should check the 
BBDISP setting in your clients' etc/hobbitclient.cfg and make sure it is
set to the IP of your Hobbit server, not the hostname.


