[hobbit] Re: Devmon causing core dumps
Robert Holden
robertholden at gmail.com
Fri Oct 31 16:35:44 CET 2008
I have seen this as well. I finally determined it was caused by ATM
interfaces. Devmon does not give different components of an ATM circuit
(the physical interface, the -atm layer, .0 sub interface, -aal5 layer)
unique names. So rrd was receiving data for 5 interfaces all with the same
name. As a temporary interface, I stopped monitoring the atm interfaces,
but this is a bug.
Interface names:
ATM5/0/0
ATM5/0/0-atm layer
ATM5/0/0.0-atm subif
ATM5/0/0-aal5 layer
ATM5/0/0.0-aal5 layer
Devmon sees these all as: ATM5/0/0 because devmon templates (atleast for
6509's) are looking at ifName as the main identifier, which is not always
unique. Not sure on a solution yet. MRTG uses ifIndex as it's unique key.
Robert
On Fri, Oct 31, 2008 at 3:15 AM, Buchan Milne <bgmilne at staff.telkomsa.net>wrote:
> On Friday 31 October 2008 05:51:42 Everett, Vernon wrote:
> > Hi all
> >
> > Devmon was causing the hobbitd_rrd module to crash and burn.
> > Now this could be a bug, but it could also be a PEBKAC. I am hoping
> > somebody can assist either way.
> >
> > I added a Cisco 2851 to Hobbit, using devmon.
> > Now here is the possible PEBKAC
> > Since Devmon doesn't have templates for the 2851, I used the template for
> > the Cisco 2811. (Network guru told me they are pretty much the same,
> except
> > for a few extra bells and whistles on the 2851.)
> >
> > The data for the device started appearing in Hobbit, and all looked good.
> > Devmon even created the rrd files for the new Cisco device.
> >
> > However, the hobbitd_rrd module started core dumping, and the Hobbit
> server
> > page started displaying red for hobbitd_rrd with the crash detected
> > message. See core data below.
> > Took the new Cisco device out of Hobbit, and cores stopped, and life was
> > good again.
> >
> > Is there a significant enough difference between the 2851 and the 2811 to
> > cause this, or are we looking at a genuine bug?
>
> Real bug. I see it on the temperature tests on a new IOS.
>
> > I am leaning towards a bug,
> > because even if the collected data was complete rubbish, should it cause
> > the module to core?
> >
> > Regards
> > Vernon
> >
> > My Linux guy reckons this is the important stuff from the core.
> > uname -a
> > Linux las006 2.6.18-92.1.1.el5 #1 SMP Thu May 22 09:01:47 EDT 2008 x86_64
> > x86_64 x86_64 GNU/Linux cat /etc/redhat-release Red Hat Enterprise Linux
> > Client release 5.2 (Tikanga)
> >
> > gdb -c core.8550 /usr/lib/hobbit/server/bin/hobbitd_rrd
> > GNU gdb Red Hat Linux (6.5-37.el5_2.1rh) Copyright (C) 2006 Free Software
> > Foundation, Inc. GDB is free software, covered by the GNU General Public
> > License, and you are welcome to change it and/or distribute copies of it
> > under certain conditions. Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> > This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> > libthread_db library "/lib64/libthread_db.so.1".
> >
> > Reading symbols from /usr/lib64/librrd.so.2...done.
> > Loaded symbols for /usr/lib64/librrd.so.2 Reading symbols from
> > /usr/lib64/libpng12.so.0...done. Loaded symbols for
> > /usr/lib64/libpng12.so.0 Reading symbols from /lib64/libpcre.so.0...done.
> > Loaded symbols for /lib64/libpcre.so.0
> > Reading symbols from /lib64/libc.so.6...done.
> > Loaded symbols for /lib64/libc.so.6
> > Reading symbols from /usr/lib64/libfreetype.so.6...done.
> > Loaded symbols for /usr/lib64/libfreetype.so.6 Reading symbols from
> > /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1
> > Reading symbols from /usr/lib64/libart_lgpl_2.so.2...done.
> > Loaded symbols for /usr/lib64/libart_lgpl_2.so.2 Reading symbols from
> > /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6
> > Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> > Loaded symbols for /lib64/ld-linux-x86-64.so.2 Core was generated by
> > `hobbitd_rrd --rrddir=/var/lib/hobbit/rrd --debug'. Program terminated
> with
> > signal 6, Aborted.
> > #0 0x0000003db7a30155 in raise () from /lib64/libc.so.6
> > (gdb) bt
> > #0 0x0000003db7a30155 in raise () from /lib64/libc.so.6
> > #1 0x0000003db7a31bf0 in abort () from /lib64/libc.so.6
> > #2 0x00000000004119f3 in sigsegv_handler (signum=<value optimized out>)
> at
> > sig.c:57 #3 <signal handler called>
> > #4 0x0000003db7a77ac0 in strcat () from /lib64/libc.so.6
> > #5 0x000000000040462a in do_devmon_rrd (hostname=0x2ada311e2806
> > "PERIR205", testname=0x2ada311e280f "if_load", msg=<value optimized out>,
> > tstamp=<value optimized out>) at rrd/do_devmon.c:87
> > #6 0x000000000040b656 in update_rrd (hostname=0x2ada311e2806 "PERIR205",
> > testname=0x2ada311e280f "if_load", msg=0x2ada311e2842 "status
> > PERIR205.if_load green Fri Oct 31 10:31:39 2008", tstamp=1225416699,
> > sender=<value optimized out>, ldef=0xfeffffffffffff00) at do_rrd.c:372 #7
> > 0x000000000040261d in main (argc=<value optimized out>,
> > argv=0x7fff7a088318) at hobbitd_rrd.c:153 (gdb)
>
>
> Could you show the Devmon RRD section of the message for the if_load test
> on
> the PERIR205 host? I can confirm the cause, and maybe offer a workaround.
>
> I am actually (constantly) reproducing the issue on my workstation against
> the
> new IOS that can trigger this, I have a workaround in place in production,
> and
> was hoping to get around to fixing this next week.
>
> Regards,
> Buchan
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20081031/2a842200/attachment.html>
More information about the Xymon
mailing list