[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Inexplicable purple on running services



Since ssh, ldap, and dns are tests run from the serverside (cpu etc remaining green indicates the clients are running and communicating OK, right?), i ran

./bbtest-net --concurrency=50 --checkresponse --no-update --timing --debug

Now, i can ping and ssh to all clients from server just fine. But i see this:

---
2005-11-01 14:14:20 Adding to combo msg: status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok
status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok


Service conn on brassai is not OK : Host does not respond to ping

System unreachable for 3 poll periods (56 seconds)
---

Aha. Since the ping test fails, why test other net services? So now it makes sense; the net tests are not being run, hence the purple.

a'course, i don't know why the nettest is suddenly unable to ping anything. It is getting the right IPs internally:

---
2005-11-01 14:14:20 Got DNS result for host doisneau : 10.x.x.x
2005-11-01 14:14:20 Got DNS result for host brassai : 10.x.x.x
2005-11-01 14:14:20 Got DNS result for host moadib : 10.x.x.x
---

and i thought cranking the concurrency way down might help, but apparently it doesn't.

So, i'm glad i found the cause... now i just need to find out the cause's cause. o_O

Rob Munsch wrote:

There's no entries in the network log since 10/28. Hobbit is running on the server, and the clients are running on the various clients.

CPU, Memory, Disk and Procs all remain green!
SSH, ldaps, and dns on the clients are purple.

On the hobbit server itself, bbd is purple.  Everything else is green.
Network connectivity between all clients > server is functional.

I don't get it...

Henrik Stoerner wrote:

On Mon, Oct 31, 2005 at 05:32:44PM -0500, Rob Munsch wrote:


Consider the below. Approx. 25 minutes ago, across all monitored systems, all net monitored services - ssh, ldaps and dns - went to purple. They are still up, running, and just fine in every respect. The status message is even the same as when it was showing green. But now every ssh, ldaps and dns light is purple.


Purple is an indication that some part of your monitoring system
has stopped.

All of the purple ones are network services ? Then it sounds as if
your network tests have stopped running. Check the
~hobbit/server/logs/bb-network.log file for any errors.


Regards, Henrik


To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk








--
Rob Munsch
Systems Analyst, Solutions for Progress
http://www.solutionsforprogress.com