[hobbit] Debugging help: bbtest-net gets http test timing wrong

Alan Sparks asparks at doublesparks.net
Sat Jun 21 00:37:35 CEST 2008


Does exactly the same thing on a fresh install of CentOS 5, x86_64. All 
built by hand.
-Alan

Alan Sparks wrote:
> I see where the problem seems to be occurring.  But for my life I 
> can't understand why.
>
> Packet traces from the Hobbit server and the Web servers showing the 
> 3-second delays show that Hobbit connects, and gets an imediate answer 
> from the server (milliseconds).  But the servers show that Hobbit does 
> not close the connection (a FIN packets sends/acks) for 3 seconds.
>
> Looking at the bb-network debugging logging, I see that the select() 
> call sleeps for 3 seconds before returning in these cases.  So the 
> only conclusion I can arrive at is that select() doesn't return with 
> the active file descriptors on schedule for some bizarre reason.
>
> For a desperation test, I forced the receive buffer on the sockets to 
> a small number (1024 bytes):
>                        if (sockok) {
>                                int size = 1024;
>                                res = setsockopt(nextinqueue->fd,
>                                        SOL_SOCKET, SO_RCVBUF, &size, 
> sizeof(size));
>
> This sortof works.  the select() no longer hangs, and the HTTP tests 
> start returning "normal"-ish results, i.e. numbers that match curl and 
> wget statistics.
>
> But, it messes with numbers for other Web servers, the ones that 
> return a page significantly larger than 1024 bytes.
>
> Like I said, I just can't get it.  Hobbit or CentOS?  There's nothing 
> odd about this build, a generic CentOS 4.6 x86_64 build, Hobbit 4.2 
> with allinone patch, build for x86_64.
>
> Any suggestions at all?  If this isn't the right place to ask, where 
> would be?  I can't get my hands around why the only thing that I can't 
> get to work here is Hobbit...
>
> Thanks for your indulgence.  I really wish I could fix this.
> -Alan
>
>
> Alan Sparks wrote:
>> After some Googling, I have added "AcceptFilter http none" directives 
>> to the Apache 2.2 servers, which hasn't really helped anything...
>>
>> Perhaps I should ask:  Can anyone verify Hobbit works correctly on a 
>> 64-bit system?  Not should, but does, on a Centos 4 or RHEL 4 x86_64 
>> install?
>>
>> I see a lot of debugging trace stuff (dbgprint calls) in the contest 
>> and httptest code.  Can anyone tell me how to enable it to trace what 
>> Hobbit is doing?
>>
>> Am really at a loss.  This can't be rocket science to get it to probe 
>> HTTP correctly.  But a week later, I still cannot get it to match any 
>> other monitoring tool's results.
>> -Alan
>>
>> Alan Sparks wrote:
>>> tcpdumps show a couple of interesting points.
>>>
>>> 1) There are definitely no DNS lookups occurring as a consequence of 
>>> the Hobbit probes.  No port 53 traffic out.
>>>
>>> 2) The packets from the Hobbit server, and the incoming packets to 
>>> the Apache server, sometimes look like:
>>>
>>> 15:20:01.160095 IP (tos 0x0, ttl  62, id 31129, offset 0, flags 
>>> [DF], proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum 
>>> ok] 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp 
>>> 143665233 0,nop,wscale 2>
>>>
>>> 15:20:04.159715 IP (tos 0x0, ttl  62, id 31131, offset 0, flags 
>>> [DF], proto 6, length: 60) hobbit.45116 > target.http: S [tcp sum 
>>> ok] 265769416:265769416(0) win 17520 <mss 8760,sackOK,timestamp 
>>> 143668233 0,nop,wscale 2>
>>>
>>> 15:20:04.160223 IP (tos 0x0, ttl  62, id 31133, offset 0, flags 
>>> [DF], proto 6, length: 40) hobbit.45116 > target.http: . [tcp sum 
>>> ok] 265769417:265769417(0) ack 1051782089 win 17520
>>>
>>> So that accounts for three seconds... it appears there are 2 SYN 
>>> packets, but the first isn't getting processed and there's a 
>>> 3-second delay to the next SYN (which gets ACKed).  I don't know why 
>>> this happens only with the Hobbit connections... and I don't know 
>>> why the first SYN seems to be getting ignored.  Server is not at all 
>>> busy.
>>>
>>> -Alan
>>> Tim McCloskey wrote:
>>>> I get that wget/curl always work.  Not sure what resolver settings 
>>>> may be implemented differently for hobbit.
>>>>
>>>> Still thinking this may be unrelated to hobbit (even though 
>>>> wget/curl work fine for you).  We have many apache boxes spanning 
>>>> multiple networks running httpd versions 1.3, 2.0 and 2.2 that 
>>>> hobbit(4.2 with allinone patch) likes just fine and reports 
>>>> accurate times (Seconds: 0.nn).  We also have fairly proper forward 
>>>> and reverse DNS records for the systems involved.
>>>>
>>>> I can't imagine hobbit parsing the wrong response times, but if 
>>>> that is the case I wonder what external libraries are used (not 
>>>> hobbit provided libs, as ours parse fine and are likely the same as 
>>>> yours).
>>>>
>>>> Anyway, good luck with the tcpdump.
>>>>
>>>> Regards,
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Alan Sparks wrote:
>>>>> UseCanonicalName is off, and HostNameLookup is off, on every 
>>>>> server, regardless of version.
>>>>> -Alan
>>>>>
>>>>> Tim McCloskey wrote:
>>>>>> What do you have for
>>>>>> UseCanonicalName
>>>>>> in the apache 2.0 boxes?
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> To unsubscribe from the hobbit list, send an e-mail to
>>>> hobbit-unsubscribe at hswn.dk
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>
>>>
>>
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>





More information about the Xymon mailing list