[hobbit] Status Unavailable

Vernon Everett v.everett at afgonline.com.au
Fri Jul 1 10:56:38 CEST 2005


Hi Henrik

It should be idle. All the system does is run hobbit. :-)

Hobbitd is currently dead in the water.
	[root at pengo log]# strace -p 3025
	Process 3025 attached - interrupt to quit
	futex(0x40141b20, FUTEX_WAIT, 2, NULL

And it's been like this a while.
When I did the kill -6 I got this.
	[root at pengo log]# strace -p 3025
	Process 3025 attached - interrupt to quit
	futex(0x40141b20, FUTEX_WAIT, 2, NULL)  = -1 EINTR (Interrupted
system call)
	--- SIGABRT (Aborted) @ 0 (0) ---
	Process 3025 detached
Which I suppose was expected :-)

I restarted it, and got this.
	[root at pengo etc]# strace -p 9223
	Process 9223 attached - interrupt to quit
	semop(32769, 0xbfffe3a0, 1
Nope, there is nothing I forgot to cut and paste.
That really was it.

And this shit just gets stranger and stranger.
It isn't dumping core.
I hit it with a kill -6 and nothing happens.
I then thought maybe we were both mistaken, and had the command wrong or
my linux was defaulted to not core, so I started vi in a session and did
a kill -6 on that. That dumped?!
Hobbit isn't dumping.

I rebooted and tried again.
I managed to get a nice strace output - see attached - but still no damn
core.

OK, I added debug, and restarted.
When I went to check the logs, I found this in hobbitlaunch.log.
---snip---
2005-07-01 16:37:21 Loading tasklist configuration from
/usr/lib/hobbit/server/etc/hobbitlaunch.cfg
2005-07-01 16:37:21 Loading hostnames
2005-07-01 16:37:21 Loading saved state
2005-07-01 16:37:21 Setting up network listener on 0.0.0.0:1984
2005-07-01 16:37:21 Cannot bind to listen socket (Address already in
use)
2005-07-01 16:37:21 Task hobbitd started with PID 4761
2005-07-01 16:37:26 Task hobbitd terminated, status 1
2005-07-01 16:37:26 Loading hostnames
2005-07-01 16:37:26 Loading saved state
2005-07-01 16:37:26 Task hobbitd started with PID 4765
2005-07-01 16:37:26 Setting up network listener on 0.0.0.0:1984
2005-07-01 16:37:26 Cannot bind to listen socket (Address already in
use)
2005-07-01 16:37:26 Task hobbitd terminated, status 1
2005-07-01 16:37:31 Loading hostnames
2005-07-01 16:37:31 Loading saved state
2005-07-01 16:37:31 Task hobbitd started with PID 4770
2005-07-01 16:37:31 Setting up network listener on 0.0.0.0:1984
2005-07-01 16:37:31 Cannot bind to listen socket (Address already in
use)
2005-07-01 16:37:31 Task hobbitd terminated, status 1
2005-07-01 16:37:36 Task hobbitd started with PID 4774
2005-07-01 16:37:36 Loading hostnames
2005-07-01 16:37:36 Loading saved state
2005-07-01 16:37:36 Setting up network listener on 0.0.0.0:1984
2005-07-01 16:37:36 Cannot bind to listen socket (Address already in
use)
2005-07-01 16:37:36 Task hobbitd terminated, status 1
2005-07-01 16:37:41 Task hobbitd started with PID 4778
2005-07-01 16:37:41 Loading hostnames
2005-07-01 16:37:41 Loading saved state
2005-07-01 16:37:41 Setting up network listener on 0.0.0.0:1984
2005-07-01 16:37:41 Cannot bind to listen socket (Address already in
use)
2005-07-01 16:37:41 Task hobbitd terminated, status 1
2005-07-01 16:37:46 Task hobbitd started with PID 4783
2005-07-01 16:37:46 Loading hostnames
2005-07-01 16:37:46 Loading saved state
2005-07-01 16:37:46 Setting up network listener on 0.0.0.0:1984
2005-07-01 16:37:46 Cannot bind to listen socket (Address already in
use)
2005-07-01 16:37:46 Task hobbitd terminated, status 1
---snip---

Looks like a clue.
I will add the output of netstat -a

Got the hobbitd.log file for you too.

Let me know if there is anything else I can get you.

Regards
    Vernon

P.S. Your cold one is quickly becoming many cold ones if you ever get to
Perth





-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk] 
Sent: Friday, 1 July 2005 3:38 PM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Status Unavailable

On Fri, Jul 01, 2005 at 03:25:30PM +0800, Vernon Everett wrote:
> Thanks for helping on this.
> I rebooted this morning. Could the memory leak still effect me in that

> short time?

Probably not. Just wanted to rule out this possibility.

> No "failed allocation" in dmesg output.
> Do you want the full output?

No, I dont think that is necessary.

> [root at pengo log]# vmstat 4 20

And your system is mostly idle with no swap or disk activity.

> [hobbit at pengo hobbit]$ server/bin/bb 127.0.0.1 "hobbitdboard"
> 2005-07-01 15:21:45 Whoops ! bb failed to send message - timeout

Could you try running "strace -p <process-ID of the hobbitd process>"
for a minute or two and send me the output, then do a "kill -6
<process-id>" and mail me the core-file from ~hobbit/server/tmp/
together with the ~hobbit/server/bin/hobbitd file ?

Also, after this try adding a "--debug" to the hobbitd commandline in
hobbitlaunch.cfg. Let it run for a while and then mail me the
hobbitd.log file.

This bug sounds a bit nasty, I think ....


Regards,
Henrik


To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

NOTICE: This message and any attachments are confidential and may contain copyright material 
of Australian Finance Group Limited or a third party. It is intended solely for the purpose of the 
addressee and any other named recipient. If you are not the intended recipient, any use, 
distribution, disclosure or copying of this message is strictly prohibited. The confidentiality attached
to this message is not waived or lost by reason of the mistaken transmission or delivery to any 
unintended party. If you have received this message in error, please notify the author immediately or 
contact Australian Finance Group on +61 8 9420 7888.




More information about the Xymon mailing list