[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] hobbitd_rrd error



I've finally gotten back to looking at this problem and have some more
info that may be relevant. It hasn't been high on the list as hobbit  is
still working fine for alerts.

Firstly, I've tried removing the existing rrd files and letting hobbit
create new ones, no change - the core files still are produced.

I've tried building hobbit 4.1.2p1 with rrdtool 1.2.11 and with 1.2.12,
no change. This is with existing rrd files and also letting hobbit
create new ones as required.

In looking at the current core files with gdb, it seems that that they
all report an error related to sendmail:

(gdb) bt
#0  0x00abe7a2 in ?? () from /lib/ld-linux.so.2
#1  0x00afe7d5 in raise () from /lib/tls/libc.so.6
#2  0x00b00149 in abort () from /lib/tls/libc.so.6
#3  0x08054af2 in sigsegv_handler (signum=11) at sig.c:57
#4  0x00afe8c8 in killpg () from /lib/tls/libc.so.6
#5  0x0804e011 in do_sendmail_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xbffc5dd0  tstamp=1143771322)
    at rrd/do_sendmail.c:127
#6  0x08050120 in update_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xb7f6f05b "data outrelay1,firstwave,com,au.sendmail Fri Mar 31
13:15:22 EST 2006\nStatistics from Tue Jun 21 10:47:07 2005\n M   msgsfr
bytes_from   msgsto    bytes_to  msgsrej msgsdis msgsqur  Mailer\n 3
25299848"..., 
    tstamp=1143771322, sender=0x0, ldef=0x0) at do_rrd.c:271
#7  0x08049e3a in main (argc=0, argv=0xbffca4e4) at hobbitd_rrd.c:199

I'm ready to rebuild the server entirely but I'm not convinced that this
will resolve the issue. As I said previously, this set up has been
working fine for months, the problem started for no obvious reason in
early march.

Regards
geoff



On Mon, 2006-03-06 at 10:58 +0100, Henrik Stoerner wrote:
> On Mon, Mar 06, 2006 at 03:55:13PM +1100, Geoff Steer wrote:
> > 
> > My hobbit server has been error free since I installed 4.1.2 but in the
> > last day of so, has had an error for hobbitd_rrd .
> > 
> > The rrd-data.log shows:
> > *** glibc detected *** double linked list
> > Worker process died with exit code 134
> > *** glibc detected *** double free or corruption (fasttop)
> 
> This usually indicates some sort of corruption of the memory
> allocation inside hobbitd_rrd. Since hobbitd_rrd depends on the
> rrdtool library, it could also be a problem with that.
> 
> Since it's glibc you're probably on a Linux/Intel platform.
> Would it be possible for you to run the hobbitd_rrd command
> through the "Valgrind" memory checker ? I don't know if
> Valgrind is included with your distribution - it is part
> of the standard Debian release, but your distro might be
> different. If you can get it installed, then just change
> the command in the "[rrddata]" section from
> 
> CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
>     hobbitd_rrd --rrddir=$BBVAR/rrd
> 
> to
> 
> CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
>     valgrind --log-file=$BBSERVERLOGS/valgrind.log \
>     hobbitd_rrd --rrddir=$BBVAR/rrd
> 
> Let it run until the errors shows up, then send me the valgrind.log.*
> files.
> 
> 
> Regards,
> Henrik
> 
> 
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe (at) hswn.dk
> 
> 
> 
> -------------------------------Safe Stamp-----------------------------------
> Your Anti-virus Service scanned this email. It is safe from known viruses.
> For more information regarding this service, please contact your service provider.



-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.