[hobbit] Hobbit server crashing

Henrik Stoerner henrik at hswn.dk
Thu Oct 9 15:17:48 CEST 2008


In <A3D12FAD74FC8B46991703F40C182BAB01078343 at permls102.wde.woodside.com.au> "Everett, Vernon" <Vernon.Everett at woodside.com.au> writes:

>My Hobbit server crashed and died.

>This happened before, a few months ago, and I shrugged it off - sometimes
>sh1t happens.
>Then it happened last week again. This time I was concerned.
>Now it has just happened again, about 40 minutes ago.

>I tried to restart hobbit, without much luck, then I walked away, put my son
>into bed, and then tried again.
>This time it worked.

>The logs never showed anything conclusive, but maybe I just don't know what
> I am looking for.

>The symptoms were the same all three times.
>All "passive" server based tests go purple.
>By passive server based, I mean conn, http, content, ssh, ftp, ftps, etc.
>The tests that do not rely on a client.
>Also went purple, was bbd and bbtest.

>All client based tests were unaffected. Graphing worked as normal. And 
>alerts were being sent out.


Your description sounds very much as if the only thing that stopped were 
the network tests (bbtest-net). Since the client-side tests are updating,
network tests go purple and alerts go out, I think that is where the
problem is. "bbtest" going purple also points in this direction.

Next time it happens, see if there's a "bbtest-net" process running (and possible 
a "hobbitping" or "fping" process as well); if there is, kill it with a "kill -6"
to make it dump core. Then do the usual stuff of getting a stacktrace from the
core file ( http://www.hswn.dk/hobbit/help/known-issues.html#bugreport )

Are you running bbtest-net with the "--no-ares" option ? Then a hung/slow DNS server
can make your network tests run very slowly.


Henrik




More information about the Xymon mailing list