[hobbit] bbgen frequent yellow alerts - hobbitd problem?

Henrik Stoerner henrik at hswn.dk
Mon Nov 6 17:29:59 CET 2006


On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:

> We are running a new installation of Hobbit 4.2 on Solaris 10 running
> in a non-global zone.  Server is a v240 but I don't think that matters
> here.
> 
> The problem here is that our bbgen status turns yellow with fairly
> high frequency, sometimes multiple times an hour, at (what seem like)
> random intervals.  In the yellow alert bbgen reports:
> "hobbitd status-board not available"

The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.

Or perhaps there is a bug, and something special in Solaris triggers it.

> Below are the output from some commands/logs.  These logs don't really
> seem to help, so let me know if there is anything else that I can send
> along to debug this issue.

> $BB --debug $BBDISP "hobbitdboard"
> (with no --debug on a 'failure' I get no output.  I'm assuming this is
> the same cause of the bbgen yellow alert)

Yes.

> bbgen --debug --report (this one turned bbgen yellow/unavailable.
> Note the quick disconnect.)
> 2006-11-03 09:51:03 load_state()
> 2006-11-03 09:51:03 Transport setup is:
> 2006-11-03 09:51:03 bbdportnumber = 1984
> 2006-11-03 09:51:03 bbdispproxyhost = NONE
> 2006-11-03 09:51:03 bbdispproxyport = 0
> 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
> 2006-11-03 09:51:03 Standard BB protocol on port 1984
> 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
> 2006-11-03 09:51:03 Connect status is 0
> 2006-11-03 09:51:03 Sent 126 bytes
> 2006-11-03 09:51:03 Closing connection

Interesting.

Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.

First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
        errprintf("hobbitd status-board not available\n");
I want you to change that to
        errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);


Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n > 0) {

I'd like you to add 8 lines between these two:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n < 0) {
		dbgprintf("recv() returned error: %s\n", strerror(errno));
		if (errno == EAGAIN) continue; 
	}
	if (n == 0) {
		dbgprintf("recv() gave us 0 bytes\n");
		continue;
	}
	if (n > 0) {

(it isn't the prettiest of programming, but it does the job for now).


After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.


Regards,
Henrik




More information about the Xymon mailing list