[hobbit] Hobbid_channel crashing on me
Sean R. Clark
sclark at nyroc.rr.com
Wed Aug 29 01:27:17 CEST 2007
I replaced the binaries
It ran for ~ 3 hours
I just get:
2007-08-28 12:17:12 Setup complete
2007-08-28 12:44:53 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2
clients
2007-08-28 15:08:58 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2
clients
2007-08-28 15:10:20 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 10501, 2
clients
2007-08-28 15:11:23 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 9917, 1
clients
2007-08-28 15:14:53 BOARDBUSY locked at 2, GETNCNT is 0, GETPID is 9917, 2
clients
2007-08-28 15:16:53 BOARDBUSY locked at 1, GETNCNT is 0, GETPID is 9917, 1
clients
And a red hobbitd_channel sent to the daemon
Core file says:
Reading hobbitd_channel
core file header read successfully
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libc.so.1
program terminated by signal ABRT (Abort)
0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [
0xfee60725, .+0xe ]
Current function is sigsegv_handler
57 abort();
(dbx) where
[1] __lwp_kill(0x1, 0x6), at 0xfee60717
[2] _thr_kill(0x1, 0x6), at 0xfee5ded4
[3] raise(0x6), at 0xfee0ced3
[4] abort(0x80599c0, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f),
at 0xfedf0969
=>[5] sigsegv_handler(signum = 11), line 57 in "sig.c"
[6] __sighndlr(0xb, 0x0, 0x80467f0, 0x804ebe8), at 0xfee5fadf
[7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3
[8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253
---- called from signal handler with signal 11 (SIGSEGV) ------
[9] main(argc = 4, argv = 0x8046b28), line 676 in "hobbitd_channel.c"
Meaning it tried to spawn a thread and dumped core
Is this a "nicer" crash ? Meaning that it will keep runing since it just
core dumped on a fork and not the whole channel? Or is this something more?
-Sean
-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk]
Sent: Tuesday, August 28, 2007 10:57 AM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Hobbid_channel crashing on me
On Tue, Aug 28, 2007 at 09:26:51AM -0400, Sean R. Clark wrote:
> I have 18,102 RRD's, 17,671 of which are controlled by the
> hobbitd_channel (the others are written/populated from other sources)
>
> The slice I have the data on has a busy% between 16-88% depending on
> what's going on (so yes, high I/O as well)
OK, then I'd suggest that you pick up the current snapshot of Hobbit from
http://www.hswn.dk/beta/ and build that. The only parts you need to replace
in your current setup are these binaries:
* hobbitd/hobbitd_channel
* hobbitd/hobbitd_rrd
* web/hobbitgraph.cgi
After running "make", shutdown Hobbit and copy these files to your
~hobbit/server/bin/ directory (it's probably wise to save the original ones
first). Then start Hobbit again, and everything should be working fine -
with a lot less I/O load, and no memory leak in hobbitd_channel.
What's changed internally is that updates of the RRD files are now cached
for up to 30 minutes before being written to disk; the RRDtool library can
handle "batch" updates of the data, so instead of updating the RRD file with
1 dataset every 5 minutes, it now gets 6 datasets in one operation every 30
minutes.
This also means that when you shutdown Hobbit, you'll see that the
hobbitd_rrd process takes quite a long time to finish - it is busy writing
all of the cached updates to disk. On my work server, this takes about 5
minutes.
Regards,
Henrik
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
More information about the Xymon
mailing list