[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] RC client release bug?





David Gore wrote:


David Gore wrote:


David Gore wrote:


Henrik Stoerner wrote:
On Thu, Jul 13, 2006 at 07:09:11PM +0000, David Gore wrote:
We have seen this with recent snapshots and the latest release candidate client. logfetch hangs which causes the client to hang and go purple for all the tests. It can be resolved by killing logfetch and deleting all the entries in ~/client/tmp. We could try to be more surgical on the deleting of files. This has happened on two very independent hosts running Solaris 8, one being a SunFire 880 and another being an E4500/E5500.

Suggestions? It can run for many days before hanging.

That's obviously interesting.

When it hangs, is it just dead ? Or is it hogging the cpu (as it would
do if it were in a tight loop somewhere in the code) ?

CPU hogging, yes.
The hosts you monitor where this happens ... what kind of entries in client-local.cfg do you have for them ? Any "dir" entries, for instance?
Those do run an external program (du), which is always something that
is harder to control.


No "dir" entries, just "file" and "log".
When it happens again, could you please try and kill it with a "kill -ABRT <logfetchPID>" ? That should cause it to dump core,
and it will be much easier to see where it hangs with a core
dump. Once you have the core dump, running it through gdb as described
in the Help->Known Problems->How to report bugs will give me much
more to work on.



Might take a few days, but we will certainly do that and see what it shows. As always thank you for the hard work!

Sooner than I expected, here is the backtrace:

GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.8"...
Core was generated by `/export/home/nmsbb/client/bin/logfetch /export/home/nmsbb/client/tmp/logfetch.o'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
#0 0xff3906e8 in memcpy () from /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
(gdb) bt
#0 0xff3906e8 in memcpy () from /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
#1 0x00012e10 in logdata (filename=0xffbef5a0 "", logdef=0x38738, truncated=0xffbef6c4)
at logfetch.c:192
#2 0x000142f4 in main (argc=215040, argv=0x34c00) at logfetch.c:844


I took a look at one of my co-workers entries in client-local.cfg:

ignore DEBUG|WARN|^at.*)$

I put a back slash in front of the left paren:

ignore DEBUG|WARN|^at.*\)$

Perhaps that may have been why it was hanging?



Ok, so that did not work, here some horrible stats by the way:

 1889 nmsbb      1   0    0   33M   33M cpu/0  159:58 24.89% logfetch
24868 nmsbb      1   0    0 7144K 6976K cpu/3   34:16 24.88% logfetch

Not good.
Sorry, should have included this:

(gdb) bt
#0 0xff3906e8 in memcpy () from /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
#1 0x00012e10 in logdata (filename=0xffbef5a8 "", logdef=0x38738, truncated=0xffbef6cc)
at logfetch.c:192
#2 0x000142f4 in main (argc=215040, argv=0x34c00) at logfetch.c:844



Regards,
Henrik


To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk




To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk