[hobbit] Hobbid_channel crashing on me

Wed Aug 29 01:27:17 CEST 2007

I replaced the binaries

It ran for ~ 3 hours

I just get:

2007-08-28 12:17:12 Setup complete
2007-08-28 12:44:53 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2
clients
2007-08-28 15:08:58 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2
clients
2007-08-28 15:10:20 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 10501, 2
clients
2007-08-28 15:11:23 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 9917, 1
clients
2007-08-28 15:14:53 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2
clients
2007-08-28 15:16:53 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 9917, 1
clients 

And a red hobbitd_channel sent to the daemon

Core file says:

Reading hobbitd_channel
core file header read successfully
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libc.so.1
program terminated by signal ABRT (Abort)
0xfee60717: __lwp_kill+0x0007:  jae      __lwp_kill+0x15        [
0xfee60725, .+0xe ]
Current function is sigsegv_handler
   57           abort();
(dbx) where

  [1] __lwp_kill(0x1, 0x6), at 0xfee60717 
  [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 
  [3] raise(0x6), at 0xfee0ced3 
  [4] abort(0x80599c0, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f),
at 0xfedf0969 
=>[5] sigsegv_handler(signum = 11), line 57 in "sig.c"
  [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x804ebe8), at 0xfee5fadf 
  [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 
  [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [9] main(argc = 4, argv = 0x8046b28), line 676 in "hobbitd_channel.c"

Meaning it tried to spawn a thread and dumped core

Is this a "nicer" crash ? Meaning that it will keep runing since it just
core dumped on a fork and not the whole channel? Or is this something more?

-Sean

-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk] 
Sent: Tuesday, August 28, 2007 10:57 AM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Hobbid_channel crashing on me

On Tue, Aug 28, 2007 at 09:26:51AM -0400, Sean R. Clark wrote:
> I have 18,102 RRD's, 17,671 of which are controlled by the 
> hobbitd_channel (the others are written/populated from other sources)
> 
> The slice I have the data on has a busy% between 16-88% depending on 
> what's going on (so yes, high I/O as well)

OK, then I'd suggest that you pick up the current snapshot of Hobbit from
http://www.hswn.dk/beta/ and build that. The only parts you need to replace
in your current setup are these binaries:

  * hobbitd/hobbitd_channel
  * hobbitd/hobbitd_rrd
  * web/hobbitgraph.cgi

After running "make", shutdown Hobbit and copy these files to your
~hobbit/server/bin/ directory (it's probably wise to save the original ones
first).  Then start Hobbit again, and everything should be working fine -
with a lot less I/O load, and no memory leak in hobbitd_channel.

What's changed internally is that updates of the RRD files are now cached
for up to 30 minutes before being written to disk; the RRDtool library can
handle "batch" updates of the data, so instead of updating the RRD file with
1 dataset every 5 minutes, it now gets 6 datasets in one operation every 30
minutes.

This also means that when you shutdown Hobbit, you'll see that the
hobbitd_rrd process takes quite a long time to finish - it is busy writing
all of the cached updates to disk. On my work server, this takes about 5
minutes.

Regards,
Henrik

To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk