[hobbit] Debugging help: bbtest-net gets http test timing wrong
Alan Sparks
asparks at doublesparks.net
Sat Jun 21 00:37:35 CEST 2008
Does exactly the same thing on a fresh install of CentOS 5, x86_64. All
built by hand.
-Alan
Alan Sparks wrote:
> I see where the problem seems to be occurring. But for my life I
> can't understand why.
>
> Packet traces from the Hobbit server and the Web servers showing the
> 3-second delays show that Hobbit connects, and gets an imediate answer
> from the server (milliseconds). But the servers show that Hobbit does
> not close the connection (a FIN packets sends/acks) for 3 seconds.
>
> Looking at the bb-network debugging logging, I see that the select()
> call sleeps for 3 seconds before returning in these cases. So the
> only conclusion I can arrive at is that select() doesn't return with
> the active file descriptors on schedule for some bizarre reason.
>
> For a desperation test, I forced the receive buffer on the sockets to
> a small number (1024 bytes):
> if (sockok) {
> int size = 1024;
> res = setsockopt(nextinqueue->fd,
> SOL_SOCKET, SO_RCVBUF, &size,
> sizeof(size));
>
> This sortof works. the select() no longer hangs, and the HTTP tests
> start returning "normal"-ish results, i.e. numbers that match curl and
> wget statistics.
>
> But, it messes with numbers for other Web servers, the ones that
> return a page significantly larger than 1024 bytes.
>
> Like I said, I just can't get it. Hobbit or CentOS? There's nothing
> odd about this build, a generic CentOS 4.6 x86_64 build, Hobbit 4.2
> with allinone patch, build for x86_64.
>
> Any suggestions at all? If this isn't the right place to ask, where
> would be? I can't get my hands around why the only thing that I can't
> get to work here is Hobbit...
>
> Thanks for your indulgence. I really wish I could fix this.
> -Alan
>
>
> Alan Sparks wrote:
>> After some Googling, I have added "AcceptFilter http none" directives
>> to the Apache 2.2 servers, which hasn't really helped anything...
>>
>> Perhaps I should ask: Can anyone verify Hobbit works correctly on a
>> 64-bit system? Not should, but does, on a Centos 4 or RHEL 4 x86_64
>> install?
>>
>> I see a lot of debugging trace stuff (dbgprint calls) in the contest
>> and httptest code. Can anyone tell me how to enable it to trace what
>> Hobbit is doing?
>>
>> Am really at a loss. This can't be rocket science to get it to probe
>> HTTP correctly. But a week later, I still cannot get it to match any
>> other monitoring tool's results.
>> -Alan
>>
>> Alan Sparks wrote:
>>> tcpdumps show a couple of interesting points.
>>>
>>> 1) There are definitely no DNS lookups occurring as a consequence of
>>> the Hobbit probes. No port 53 traffic out.
>>>
>>> 2) The packets from the Hobbit server, and the incoming packets to
>>> the Apache server, sometimes look like:
>>>
>>> 15:20:01.160095 IP (tos 0x0, ttl 62, id 31129, offset 0, flags
>>> [DF], proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum
>>> ok] 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp
>>> 143665233 0,nop,wscale 2>
>>>
>>> 15:20:04.159715 IP (tos 0x0, ttl 62, id 31131, offset 0, flags
>>> [DF], proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum
>>> ok] 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp
>>> 143668233 0,nop,wscale 2>
>>>
>>> 15:20:04.160223 IP (tos 0x0, ttl 62, id 31133, offset 0, flags
>>> [DF], proto 6, length: 40) hobbit.45116 > target.http: . [tcp sum
>>> ok] 265769417:265769417(0) ack 1051782089 win 17520
>>>
>>> So that accounts for three seconds... it appears there are 2 SYN
>>> packets, but the first isn't getting processed and there's a
>>> 3-second delay to the next SYN (which gets ACKed). I don't know why
>>> this happens only with the Hobbit connections... and I don't know
>>> why the first SYN seems to be getting ignored. Server is not at all
>>> busy.
>>>
>>> -Alan
>>> Tim McCloskey wrote:
>>>> I get that wget/curl always work. Not sure what resolver settings
>>>> may be implemented differently for hobbit.
>>>>
>>>> Still thinking this may be unrelated to hobbit (even though
>>>> wget/curl work fine for you). We have many apache boxes spanning
>>>> multiple networks running httpd versions 1.3, 2.0 and 2.2 that
>>>> hobbit(4.2 with allinone patch) likes just fine and reports
>>>> accurate times (Seconds: 0.nn). We also have fairly proper forward
>>>> and reverse DNS records for the systems involved.
>>>>
>>>> I can't imagine hobbit parsing the wrong response times, but if
>>>> that is the case I wonder what external libraries are used (not
>>>> hobbit provided libs, as ours parse fine and are likely the same as
>>>> yours).
>>>>
>>>> Anyway, good luck with the tcpdump.
>>>>
>>>> Regards,
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Alan Sparks wrote:
>>>>> UseCanonicalName is off, and HostNameLookup is off, on every
>>>>> server, regardless of version.
>>>>> -Alan
>>>>>
>>>>> Tim McCloskey wrote:
>>>>>> What do you have for
>>>>>> UseCanonicalName
>>>>>> in the apache 2.0 boxes?
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> To unsubscribe from the hobbit list, send an e-mail to
>>>> hobbit-unsubscribe at hswn.dk
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>
>>>
>>
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>
More information about the Xymon
mailing list