[hobbit] Debugging help: bbtest-net gets http test timing wrong
Alan Sparks
asparks at doublesparks.net
Fri Jun 20 04:15:01 CEST 2008
I see where the problem seems to be occurring. But for my life I can't
understand why.
Packet traces from the Hobbit server and the Web servers showing the
3-second delays show that Hobbit connects, and gets an imediate answer
from the server (milliseconds). But the servers show that Hobbit does
not close the connection (a FIN packets sends/acks) for 3 seconds.
Looking at the bb-network debugging logging, I see that the select()
call sleeps for 3 seconds before returning in these cases. So the only
conclusion I can arrive at is that select() doesn't return with the
active file descriptors on schedule for some bizarre reason.
For a desperation test, I forced the receive buffer on the sockets to a
small number (1024 bytes):
if (sockok) {
int size = 1024;
res = setsockopt(nextinqueue->fd,
SOL_SOCKET, SO_RCVBUF, &size,
sizeof(size));
This sortof works. the select() no longer hangs, and the HTTP tests
start returning "normal"-ish results, i.e. numbers that match curl and
wget statistics.
But, it messes with numbers for other Web servers, the ones that return
a page significantly larger than 1024 bytes.
Like I said, I just can't get it. Hobbit or CentOS? There's nothing
odd about this build, a generic CentOS 4.6 x86_64 build, Hobbit 4.2 with
allinone patch, build for x86_64.
Any suggestions at all? If this isn't the right place to ask, where
would be? I can't get my hands around why the only thing that I can't
get to work here is Hobbit...
Thanks for your indulgence. I really wish I could fix this.
-Alan
Alan Sparks wrote:
> After some Googling, I have added "AcceptFilter http none" directives
> to the Apache 2.2 servers, which hasn't really helped anything...
>
> Perhaps I should ask: Can anyone verify Hobbit works correctly on a
> 64-bit system? Not should, but does, on a Centos 4 or RHEL 4 x86_64
> install?
>
> I see a lot of debugging trace stuff (dbgprint calls) in the contest
> and httptest code. Can anyone tell me how to enable it to trace what
> Hobbit is doing?
>
> Am really at a loss. This can't be rocket science to get it to probe
> HTTP correctly. But a week later, I still cannot get it to match any
> other monitoring tool's results.
> -Alan
>
> Alan Sparks wrote:
>> tcpdumps show a couple of interesting points.
>>
>> 1) There are definitely no DNS lookups occurring as a consequence of
>> the Hobbit probes. No port 53 traffic out.
>>
>> 2) The packets from the Hobbit server, and the incoming packets to
>> the Apache server, sometimes look like:
>>
>> 15:20:01.160095 IP (tos 0x0, ttl 62, id 31129, offset 0, flags [DF],
>> proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum ok]
>> 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp 143665233
>> 0,nop,wscale 2>
>>
>> 15:20:04.159715 IP (tos 0x0, ttl 62, id 31131, offset 0, flags [DF],
>> proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum ok]
>> 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp 143668233
>> 0,nop,wscale 2>
>>
>> 15:20:04.160223 IP (tos 0x0, ttl 62, id 31133, offset 0, flags [DF],
>> proto 6, length: 40) hobbit.45116 > target.http: . [tcp sum ok]
>> 265769417:265769417(0) ack 1051782089 win 17520
>>
>> So that accounts for three seconds... it appears there are 2 SYN
>> packets, but the first isn't getting processed and there's a 3-second
>> delay to the next SYN (which gets ACKed). I don't know why this
>> happens only with the Hobbit connections... and I don't know why the
>> first SYN seems to be getting ignored. Server is not at all busy.
>>
>> -Alan
>> Tim McCloskey wrote:
>>> I get that wget/curl always work. Not sure what resolver settings
>>> may be implemented differently for hobbit.
>>>
>>> Still thinking this may be unrelated to hobbit (even though
>>> wget/curl work fine for you). We have many apache boxes spanning
>>> multiple networks running httpd versions 1.3, 2.0 and 2.2 that
>>> hobbit(4.2 with allinone patch) likes just fine and reports accurate
>>> times (Seconds: 0.nn). We also have fairly proper forward and
>>> reverse DNS records for the systems involved.
>>>
>>> I can't imagine hobbit parsing the wrong response times, but if that
>>> is the case I wonder what external libraries are used (not hobbit
>>> provided libs, as ours parse fine and are likely the same as yours).
>>>
>>> Anyway, good luck with the tcpdump.
>>>
>>> Regards,
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>>
>>> Alan Sparks wrote:
>>>> UseCanonicalName is off, and HostNameLookup is off, on every
>>>> server, regardless of version.
>>>> -Alan
>>>>
>>>> Tim McCloskey wrote:
>>>>> What do you have for
>>>>> UseCanonicalName
>>>>> in the apache 2.0 boxes?
>>>>>
>>>
>>>
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>
>>>
>>
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>
More information about the Xymon
mailing list