[hobbit] RC client release bug?
David Gore
David.Gore at verizonbusiness.com
Fri Jul 14 04:09:40 CEST 2006
David Gore wrote:
>
>
> David Gore wrote:
>>
>>
>> Henrik Stoerner wrote:
>>> On Thu, Jul 13, 2006 at 07:09:11PM +0000, David Gore wrote:
>>>
>>>> We have seen this with recent snapshots and the latest release
>>>> candidate client. logfetch hangs which causes the client to hang
>>>> and go purple for all the tests. It can be resolved by killing
>>>> logfetch and deleting all the entries in ~/client/tmp. We could
>>>> try to be more surgical on the deleting of files. This has
>>>> happened on two very independent hosts running Solaris 8, one being
>>>> a SunFire 880 and another being an E4500/E5500.
>>>>
>>>> Suggestions? It can run for many days before hanging.
>>>>
>>>
>>> That's obviously interesting.
>>>
>>> When it hangs, is it just dead ? Or is it hogging the cpu (as it would
>>> do if it were in a tight loop somewhere in the code) ?
>>>
>>>
>> CPU hogging, yes.
>>> The hosts you monitor where this happens ... what kind of entries in
>>> client-local.cfg do you have for them ? Any "dir" entries, for
>>> instance?
>>> Those do run an external program (du), which is always something that
>>> is harder to control.
>>>
>>>
>> No "dir" entries, just "file" and "log".
>>> When it happens again, could you please try and kill it with a "kill
>>> -ABRT <logfetchPID>" ? That should cause it to dump core,
>>> and it will be much easier to see where it hangs with a core
>>> dump. Once you have the core dump, running it through gdb as described
>>> in the Help->Known Problems->How to report bugs will give me much
>>> more to work on.
>>>
>>>
>>>
>> Might take a few days, but we will certainly do that and see what it
>> shows. As always thank you for the hard work!
>
> Sooner than I expected, here is the backtrace:
>
> GNU gdb 6.0
> Copyright 2003 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "sparc-sun-solaris2.8"...
> Core was generated by `/export/home/nmsbb/client/bin/logfetch
> /export/home/nmsbb/client/tmp/logfetch.o'.
> Program terminated with signal 6, Aborted.
> Reading symbols from /usr/lib/libc.so.1...done.
> Loaded symbols for /usr/lib/libc.so.1
> Reading symbols from /usr/lib/libdl.so.1...done.
> Loaded symbols for /usr/lib/libdl.so.1
> Reading symbols from
> /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1...done.
> Loaded symbols for /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
> #0 0xff3906e8 in memcpy () from
> /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
> (gdb) bt
> #0 0xff3906e8 in memcpy () from
> /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1
> #1 0x00012e10 in logdata (filename=0xffbef5a0 "", logdef=0x38738,
> truncated=0xffbef6c4)
> at logfetch.c:192
> #2 0x000142f4 in main (argc=215040, argv=0x34c00) at logfetch.c:844
>
> I took a look at one of my co-workers entries in client-local.cfg:
>
> ignore DEBUG|WARN|^at.*)$
>
> I put a back slash in front of the left paren:
>
> ignore DEBUG|WARN|^at.*\)$
>
> Perhaps that may have been why it was hanging?
>
>
>
Ok, so that did not work, here some horrible stats by the way:
1889 nmsbb 1 0 0 33M 33M cpu/0 159:58 24.89% logfetch
24868 nmsbb 1 0 0 7144K 6976K cpu/3 34:16 24.88% logfetch
Not good.
>>> Regards,
>>> Henrik
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>
>>>
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>
More information about the Xymon
mailing list