Devmon causing core dumps
Buchan Milne
bgmilne at staff.telkomsa.net
Fri Oct 31 11:15:27 CET 2008
On Friday 31 October 2008 05:51:42 Everett, Vernon wrote:
> Hi all
>
> Devmon was causing the hobbitd_rrd module to crash and burn.
> Now this could be a bug, but it could also be a PEBKAC. I am hoping
> somebody can assist either way.
>
> I added a Cisco 2851 to Hobbit, using devmon.
> Now here is the possible PEBKAC
> Since Devmon doesn't have templates for the 2851, I used the template for
> the Cisco 2811. (Network guru told me they are pretty much the same, except
> for a few extra bells and whistles on the 2851.)
>
> The data for the device started appearing in Hobbit, and all looked good.
> Devmon even created the rrd files for the new Cisco device.
>
> However, the hobbitd_rrd module started core dumping, and the Hobbit server
> page started displaying red for hobbitd_rrd with the crash detected
> message. See core data below.
> Took the new Cisco device out of Hobbit, and cores stopped, and life was
> good again.
>
> Is there a significant enough difference between the 2851 and the 2811 to
> cause this, or are we looking at a genuine bug?
Real bug. I see it on the temperature tests on a new IOS.
> I am leaning towards a bug,
> because even if the collected data was complete rubbish, should it cause
> the module to core?
>
> Regards
> Vernon
>
> My Linux guy reckons this is the important stuff from the core.
> uname -a
> Linux las006 2.6.18-92.1.1.el5 #1 SMP Thu May 22 09:01:47 EDT 2008 x86_64
> x86_64 x86_64 GNU/Linux cat /etc/redhat-release Red Hat Enterprise Linux
> Client release 5.2 (Tikanga)
>
> gdb -c core.8550 /usr/lib/hobbit/server/bin/hobbitd_rrd
> GNU gdb Red Hat Linux (6.5-37.el5_2.1rh) Copyright (C) 2006 Free Software
> Foundation, Inc. GDB is free software, covered by the GNU General Public
> License, and you are welcome to change it and/or distribute copies of it
> under certain conditions. Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/libthread_db.so.1".
>
> Reading symbols from /usr/lib64/librrd.so.2...done.
> Loaded symbols for /usr/lib64/librrd.so.2 Reading symbols from
> /usr/lib64/libpng12.so.0...done. Loaded symbols for
> /usr/lib64/libpng12.so.0 Reading symbols from /lib64/libpcre.so.0...done.
> Loaded symbols for /lib64/libpcre.so.0
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /usr/lib64/libfreetype.so.6...done.
> Loaded symbols for /usr/lib64/libfreetype.so.6 Reading symbols from
> /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1
> Reading symbols from /usr/lib64/libart_lgpl_2.so.2...done.
> Loaded symbols for /usr/lib64/libart_lgpl_2.so.2 Reading symbols from
> /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2 Core was generated by
> `hobbitd_rrd --rrddir=/var/lib/hobbit/rrd --debug'. Program terminated with
> signal 6, Aborted.
> #0 0x0000003db7a30155 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0 0x0000003db7a30155 in raise () from /lib64/libc.so.6
> #1 0x0000003db7a31bf0 in abort () from /lib64/libc.so.6
> #2 0x00000000004119f3 in sigsegv_handler (signum=<value optimized out>) at
> sig.c:57 #3 <signal handler called>
> #4 0x0000003db7a77ac0 in strcat () from /lib64/libc.so.6
> #5 0x000000000040462a in do_devmon_rrd (hostname=0x2ada311e2806
> "PERIR205", testname=0x2ada311e280f "if_load", msg=<value optimized out>,
> tstamp=<value optimized out>) at rrd/do_devmon.c:87
> #6 0x000000000040b656 in update_rrd (hostname=0x2ada311e2806 "PERIR205",
> testname=0x2ada311e280f "if_load", msg=0x2ada311e2842 "status
> PERIR205.if_load green Fri Oct 31 10:31:39 2008", tstamp=1225416699,
> sender=<value optimized out>, ldef=0xfeffffffffffff00) at do_rrd.c:372 #7
> 0x000000000040261d in main (argc=<value optimized out>,
> argv=0x7fff7a088318) at hobbitd_rrd.c:153 (gdb)
Could you show the Devmon RRD section of the message for the if_load test on
the PERIR205 host? I can confirm the cause, and maybe offer a workaround.
I am actually (constantly) reproducing the issue on my workstation against the
new IOS that can trigger this, I have a workaround in place in production, and
was hoping to get around to fixing this next week.
Regards,
Buchan
More information about the Xymon
mailing list