[hobbit] RC client release bug?

David Gore David.Gore at verizonbusiness.com
Fri Jul 14 01:46:40 CEST 2006



David Gore wrote:
>
>
> Henrik Stoerner wrote:
>> On Thu, Jul 13, 2006 at 07:09:11PM +0000, David Gore wrote:
>>  
>>> We have seen this with recent snapshots and the latest release 
>>> candidate client.  logfetch hangs which causes the client to hang 
>>> and go purple for all the tests.  It can be resolved by killing 
>>> logfetch and deleting all the entries in ~/client/tmp.  We could try 
>>> to be more surgical on the deleting of files.  This has happened on 
>>> two very independent hosts running Solaris 8, one being a SunFire 
>>> 880 and another being an E4500/E5500.
>>>
>>> Suggestions?  It can run for many days before hanging.
>>>     
>>
>> That's obviously interesting.
>>
>> When it hangs, is it just dead ? Or is it hogging the cpu (as it would
>> do if it were in a tight loop somewhere in the code) ?
>>
>>   
> CPU hogging, yes.
>> The hosts you monitor where this happens ... what kind of entries in 
>> client-local.cfg do you have for them ? Any "dir" entries, for instance?
>> Those do run an external program (du), which is always something that
>> is harder to control.
>>
>>   
> No "dir" entries, just "file" and "log".
>> When it happens again, could you please try and kill it with a "kill 
>> -ABRT <logfetchPID>" ? That should cause it to dump core,
>> and it will be much easier to see where it hangs with a core
>> dump. Once you have the core dump, running it through gdb as described
>> in the Help->Known Problems->How to report bugs will give me much
>> more to work on.
>>
>>
>>   
> Might take a few days, but we will certainly do that and see what it 
> shows.  As always thank you for the hard work!

Sooner than I expected, here is the backtrace:

GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.8"...
Core was generated by `/export/home/nmsbb/client/bin/logfetch 
/export/home/nmsbb/client/tmp/logfetch.o'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from 
/usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
#0  0xff3906e8 in memcpy () from 
/usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
(gdb) bt
#0  0xff3906e8 in memcpy () from 
/usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
#1  0x00012e10 in logdata (filename=0xffbef5a0 "", logdef=0x38738, 
truncated=0xffbef6c4)
    at logfetch.c:192
#2  0x000142f4 in main (argc=215040, argv=0x34c00) at logfetch.c:844

I took a look at one of my co-workers entries in client-local.cfg:

ignore DEBUG|WARN|^at.*)$

I put a back slash in front of the left paren:

ignore DEBUG|WARN|^at.*\)$

Perhaps that may have been why it was hanging? 




>> Regards,
>> Henrik
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>   
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>




More information about the Xymon mailing list