[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Network test dying



On Mon, Mar 13, 2006 at 12:45:27PM -0500, James B Horwath wrote:
> I have been running hobbit for several months now without incident.  I am 
> running hobbit 4.1.2p1 on Redhat Enterprise 3 on IBM pseries hardware.  I 
> haven't had any issues until this morning.  Now it appears after about one 
> hour of running the system flat out dies. I am sent a notification for 
> every system connected.  Then it appears the network process dies.  I was 
> running Tcpdump to see what was wrong. I see the completion of a network 
> test about 30 minutes ago to a machine on the same subnet.  I am not 
> running iptables/ipchains.  I am not experienced at hard-core hobbit 
> debugging.  I looked in /var/log/hobbit and don't see anything strange. 
> There are no core files on the hobbit directory.
> 
> Any advise on where to start?  All my network test are now purple.

Is there a "bbtest-net" and/or "fping" process which hangs ? If there
is, it would be interesting to attach to it with "gdb" and see what
it is doing. Alternatively, kill it with a "kill -6" which will trigger 
a core dump in ~hobbit/data/tmp/ - you can run the core dump through 
gdb, which might give me an idea what it is doing.


You can also try su'ing to the hobbit user and run the command

   bbcmd bbtest-net --debug host1 host2

(replace "host1" and "host2" with a couple of the hosts in your
 bb-hosts file).


Is DNS lookups working on this box ? That is one of the few things that
can cause the network tests to slow down dramatically. But they ought to
time out automatically. Same goes for the other commands that run as
part of the network tests (rpc and ntp queries).


Regards,
Henrik