[hobbit] bbgen frequent yellow alerts - hobbitd problem?
Mike Rowell
Mike.Rowell at Rightmove.co.uk
Mon Nov 6 19:33:41 CET 2006
Henrik,
It might be worth checking to make sure these problems are only on
Solaris 10 x86 as that is the only architecture I've seen this problem
on, sparc seems fine so might help you in narrowing down the problem.
Regards,
Mike Rowell
-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk]
Sent: 06 November 2006 16:30
To: hobbit at hswn.dk
Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
> We are running a new installation of Hobbit 4.2 on Solaris 10 running
> in a non-global zone. Server is a v240 but I don't think that matters
> here.
>
> The problem here is that our bbgen status turns yellow with fairly
> high frequency, sometimes multiple times an hour, at (what seem like)
> random intervals. In the yellow alert bbgen reports:
> "hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.
Or perhaps there is a bug, and something special in Solaris triggers it.
> Below are the output from some commands/logs. These logs don't really
> seem to help, so let me know if there is anything else that I can send
> along to debug this issue.
> $BB --debug $BBDISP "hobbitdboard"
> (with no --debug on a 'failure' I get no output. I'm assuming this is
> the same cause of the bbgen yellow alert)
Yes.
> bbgen --debug --report (this one turned bbgen yellow/unavailable.
> Note the quick disconnect.)
> 2006-11-03 09:51:03 load_state()
> 2006-11-03 09:51:03 Transport setup is:
> 2006-11-03 09:51:03 bbdportnumber = 1984
> 2006-11-03 09:51:03 bbdispproxyhost = NONE
> 2006-11-03 09:51:03 bbdispproxyport = 0
> 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
> 2006-11-03 09:51:03 Standard BB protocol on port 1984
> 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
> 2006-11-03 09:51:03 Connect status is 0
> 2006-11-03 09:51:03 Sent 126 bytes
> 2006-11-03 09:51:03 Closing connection
Interesting.
Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.
First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
errprintf("hobbitd status-board not available\n");
I want you to change that to
errprintf("hobbitd status-board not available, code %d\n",
hobbitdresult);
Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n > 0) {
I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n < 0) {
dbgprintf("recv() returned error: %s\n",
strerror(errno));
if (errno == EAGAIN) continue;
}
if (n == 0) {
dbgprintf("recv() gave us 0 bytes\n");
continue;
}
if (n > 0) {
(it isn't the prettiest of programming, but it does the job for now).
After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.
Regards,
Henrik
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs service.
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs service.
________________________________________________________________________
More information about the Xymon
mailing list