[hobbit] Debugging help: bbtest-net gets http test timing wrong

Alan Sparks asparks at doublesparks.net
Fri Jun 20 04:15:01 CEST 2008


I see where the problem seems to be occurring.  But for my life I can't 
understand why.

Packet traces from the Hobbit server and the Web servers showing the 
3-second delays show that Hobbit connects, and gets an imediate answer 
from the server (milliseconds).  But the servers show that Hobbit does 
not close the connection (a FIN packets sends/acks) for 3 seconds.

Looking at the bb-network debugging logging, I see that the select() 
call sleeps for 3 seconds before returning in these cases.  So the only 
conclusion I can arrive at is that select() doesn't return with the 
active file descriptors on schedule for some bizarre reason.

For a desperation test, I forced the receive buffer on the sockets to a 
small number (1024 bytes):
                        if (sockok) {
                                int size = 1024;
                                res = setsockopt(nextinqueue->fd,
                                        SOL_SOCKET, SO_RCVBUF, &size, 
sizeof(size));

This sortof works.  the select() no longer hangs, and the HTTP tests 
start returning "normal"-ish results, i.e. numbers that match curl and 
wget statistics.

But, it messes with numbers for other Web servers, the ones that return 
a page significantly larger than 1024 bytes.

Like I said, I just can't get it.  Hobbit or CentOS?  There's nothing 
odd about this build, a generic CentOS 4.6 x86_64 build, Hobbit 4.2 with 
allinone patch, build for x86_64.

Any suggestions at all?  If this isn't the right place to ask, where 
would be?  I can't get my hands around why the only thing that I can't 
get to work here is Hobbit...

Thanks for your indulgence.  I really wish I could fix this.
-Alan


Alan Sparks wrote:
> After some Googling, I have added "AcceptFilter http none" directives 
> to the Apache 2.2 servers, which hasn't really helped anything...
>
> Perhaps I should ask:  Can anyone verify Hobbit works correctly on a 
> 64-bit system?  Not should, but does, on a Centos 4 or RHEL 4 x86_64 
> install?
>
> I see a lot of debugging trace stuff (dbgprint calls) in the contest 
> and httptest code.  Can anyone tell me how to enable it to trace what 
> Hobbit is doing?
>
> Am really at a loss.  This can't be rocket science to get it to probe 
> HTTP correctly.  But a week later, I still cannot get it to match any 
> other monitoring tool's results.
> -Alan
>
> Alan Sparks wrote:
>> tcpdumps show a couple of interesting points.
>>
>> 1) There are definitely no DNS lookups occurring as a consequence of 
>> the Hobbit probes.  No port 53 traffic out.
>>
>> 2) The packets from the Hobbit server, and the incoming packets to 
>> the Apache server, sometimes look like:
>>
>> 15:20:01.160095 IP (tos 0x0, ttl  62, id 31129, offset 0, flags [DF], 
>> proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum ok] 
>> 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp 143665233 
>> 0,nop,wscale 2>
>>
>> 15:20:04.159715 IP (tos 0x0, ttl  62, id 31131, offset 0, flags [DF], 
>> proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum ok] 
>> 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp 143668233 
>> 0,nop,wscale 2>
>>
>> 15:20:04.160223 IP (tos 0x0, ttl  62, id 31133, offset 0, flags [DF], 
>> proto 6, length: 40) hobbit.45116 > target.http: . [tcp sum ok] 
>> 265769417:265769417(0) ack 1051782089 win 17520
>>
>> So that accounts for three seconds... it appears there are 2 SYN 
>> packets, but the first isn't getting processed and there's a 3-second 
>> delay to the next SYN (which gets ACKed).  I don't know why this 
>> happens only with the Hobbit connections... and I don't know why the 
>> first SYN seems to be getting ignored.  Server is not at all busy.
>>
>> -Alan
>> Tim McCloskey wrote:
>>> I get that wget/curl always work.  Not sure what resolver settings 
>>> may be implemented differently for hobbit.
>>>
>>> Still thinking this may be unrelated to hobbit (even though 
>>> wget/curl work fine for you).  We have many apache boxes spanning 
>>> multiple networks running httpd versions 1.3, 2.0 and 2.2 that 
>>> hobbit(4.2 with allinone patch) likes just fine and reports accurate 
>>> times (Seconds: 0.nn).  We also have fairly proper forward and 
>>> reverse DNS records for the systems involved.
>>>
>>> I can't imagine hobbit parsing the wrong response times, but if that 
>>> is the case I wonder what external libraries are used (not hobbit 
>>> provided libs, as ours parse fine and are likely the same as yours).
>>>
>>> Anyway, good luck with the tcpdump.
>>>
>>> Regards,
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>>
>>> Alan Sparks wrote:
>>>> UseCanonicalName is off, and HostNameLookup is off, on every 
>>>> server, regardless of version.
>>>> -Alan
>>>>
>>>> Tim McCloskey wrote:
>>>>> What do you have for
>>>>> UseCanonicalName
>>>>> in the apache 2.0 boxes?
>>>>>
>>>
>>>
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>
>>>
>>
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>





More information about the Xymon mailing list