Devmon causing core dumps

Buchan Milne bgmilne at staff.telkomsa.net
Fri Oct 31 11:15:27 CET 2008


On Friday 31 October 2008 05:51:42 Everett, Vernon wrote:
> Hi all
>
> Devmon was causing the hobbitd_rrd module to crash and burn.
> Now this could be a bug, but it could also be a PEBKAC. I am hoping
> somebody can assist either way.
>
> I added a Cisco 2851 to Hobbit, using devmon.
> Now here is the possible PEBKAC
> Since Devmon doesn't have templates for the 2851, I used the template for
> the Cisco 2811. (Network guru told me they are pretty much the same, except
> for a few extra bells and whistles on the 2851.)
>
> The data for the device started appearing in Hobbit, and all looked good.
> Devmon even created the rrd files for the new Cisco device.
>
> However, the hobbitd_rrd module started core dumping, and the Hobbit server
> page started displaying red for hobbitd_rrd with the crash detected
> message. See core data below.
> Took the new Cisco device out of Hobbit, and cores stopped, and life was
> good again.
>
> Is there a significant enough difference between the 2851 and the 2811 to
> cause this, or are we looking at a genuine bug?

Real bug. I see it on the temperature tests on a new IOS.

> I am leaning towards a bug,
> because even if the collected data was complete rubbish, should it cause
> the module to core?
>
> Regards
>      Vernon
>
> My Linux guy reckons this is the important stuff from the core.
> uname -a
> Linux las006 2.6.18-92.1.1.el5 #1 SMP Thu May 22 09:01:47 EDT 2008 x86_64
> x86_64 x86_64 GNU/Linux cat /etc/redhat-release Red Hat Enterprise Linux
> Client release 5.2 (Tikanga)
>
> gdb -c core.8550 /usr/lib/hobbit/server/bin/hobbitd_rrd
> GNU gdb Red Hat Linux (6.5-37.el5_2.1rh) Copyright (C) 2006 Free Software
> Foundation, Inc. GDB is free software, covered by the GNU General Public
> License, and you are welcome to change it and/or distribute copies of it
> under certain conditions. Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/libthread_db.so.1".
>
> Reading symbols from /usr/lib64/librrd.so.2...done.
> Loaded symbols for /usr/lib64/librrd.so.2 Reading symbols from
> /usr/lib64/libpng12.so.0...done. Loaded symbols for
> /usr/lib64/libpng12.so.0 Reading symbols from /lib64/libpcre.so.0...done.
> Loaded symbols for /lib64/libpcre.so.0
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /usr/lib64/libfreetype.so.6...done.
> Loaded symbols for /usr/lib64/libfreetype.so.6 Reading symbols from
> /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1
> Reading symbols from /usr/lib64/libart_lgpl_2.so.2...done.
> Loaded symbols for /usr/lib64/libart_lgpl_2.so.2 Reading symbols from
> /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2 Core was generated by
> `hobbitd_rrd --rrddir=/var/lib/hobbit/rrd --debug'. Program terminated with
> signal 6, Aborted.
> #0  0x0000003db7a30155 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x0000003db7a30155 in raise () from /lib64/libc.so.6
> #1  0x0000003db7a31bf0 in abort () from /lib64/libc.so.6
> #2  0x00000000004119f3 in sigsegv_handler (signum=<value optimized out>) at
> sig.c:57 #3  <signal handler called>
> #4  0x0000003db7a77ac0 in strcat () from /lib64/libc.so.6
> #5  0x000000000040462a in do_devmon_rrd (hostname=0x2ada311e2806
> "PERIR205", testname=0x2ada311e280f "if_load", msg=<value optimized out>,
> tstamp=<value optimized out>) at rrd/do_devmon.c:87
> #6  0x000000000040b656 in update_rrd (hostname=0x2ada311e2806 "PERIR205",
> testname=0x2ada311e280f "if_load", msg=0x2ada311e2842 "status
> PERIR205.if_load green Fri Oct 31 10:31:39 2008", tstamp=1225416699,
> sender=<value optimized out>, ldef=0xfeffffffffffff00) at do_rrd.c:372 #7 
> 0x000000000040261d in main (argc=<value optimized out>,
> argv=0x7fff7a088318) at hobbitd_rrd.c:153 (gdb)


Could you show the Devmon RRD section of the message for the if_load test on 
the PERIR205 host? I can confirm the cause, and maybe offer a workaround.

I am actually (constantly) reproducing the issue on my workstation against the 
new IOS that can trigger this, I have a workaround in place in production, and 
was hoping to get around to fixing this next week.

Regards,
Buchan




More information about the Xymon mailing list