[hobbit] Debugging help: bbtest-net gets http test timing wrong

Alan Sparks asparks at doublesparks.net
Mon Jun 16 06:43:45 CEST 2008


UseCanonicalName is off, and HostNameLookup is off, on every server, 
regardless of version.
-Alan

Tim McCloskey wrote:
> What do you have for
> UseCanonicalName
> in the apache 2.0 boxes?
>
>
>
> Alan Sparks wrote:
>> Continuing to try to debug this problem, have tried about everything 
>> I can to resolve the issues with http probes.  Including:
>> * Complete rebuild of the server with CentOS 4.6 and recompile of 
>> Hobbit.  Same issues.
>> * Removing everything from /etc/sysctl.conf, rebooting.  Same issues.
>> * Manipulating the httpd.conf configs on remote servers, forcing 
>> HTTP/1.0, removing ETags, creating a very simple index page to test 
>> against.  Same issues.
>> * Upgrading Apache on sample remote server to Apache 2.2.8 (most are 
>> 2.2.4).  Same issue.
>> * Recompiling Hobbit with debugging flags, to make sure the optimizer 
>> is not applied.  Same issue.
>>
>> The only two servers I have that seem to work consistently well are a 
>> pair of Apache 2.0.52 servers.  The 2.2.4+ servers all seem to give 
>> Hobbit issues.  Although, again, repeated curl or wget probe cycles 
>> against the servers from the Hobbit server never show more than a 
>> 0.2s response time.
>>
>> But, Hobbit continues to report things like:
>>
>> http://10.1.17.251/ - OK
>>
>> HTTP/1.1 200 OK
>> Date: Mon, 16 Jun 2008 00:54:03 GMT
>> Server: Apache/2.2.8 (EL)
>> Last-Modified: Sun, 15 Jun 2008 00:37:00 GMT
>> ETag: "7c809f-9b-44fa9b5806300"
>> Accept-Ranges: bytes
>> Content-Length: 155
>> Connection: close
>> Content-Type: text/html; charset=UTF-8
>>
>>
>> I can't come up with anything other than Hobbit as a cause. But is 
>> there anything I can do to trace what is happening internally to get 
>> past this problem?  Any ideas at all would really be appreciated.  
>> Thanks in advance.
>> -Alan
>>
>> Seconds:     3.00
>>
>>
>> Alan Sparks wrote:
>>> Have a new install of Hobbit (4.2, tried 4.3 snap as well) on a 
>>> fresh install of CentOS 4.6 x86_64, up to date on patches.  I have a 
>>> problem with HTTP tests on "random" web servers that I just can't 
>>> figure out.
>>>
>>> I have about 64 of my hosts in the bb-hosts on this server, and have 
>>> http tests defined for these servers.  On most of these servers, 
>>> Hobbit is reporting the "Seconds:" for the response at 3 seconds.  
>>> It seems that it is inconsistent -- one cycle to the next, the 
>>> 3-second response may move to a different set of servers.
>>>
>>> The http: tests are defined using the IP address of the server - no 
>>> server name (so no DNS lookup).
>>>
>>> I've run a loop of tests on the same URL using wget and with curl, 
>>> and used my browser and Telnet to connect to the same URL.  I 
>>> consistently get a response time of about 0.2 seconds maximum from 
>>> the servers.
>>>
>>> The bbnet entry in hobbitlaunch.cfg looks like:
>>> CMD bbtest-net --report --ping --checkresponse --debug
>>>
>>> With the debugging turned on, I see the following entries 
>>> periodically in the network test log:
>>> Address=10.1.5.17:80, open=1, res=0, err=0, connecttime=0.002965, 
>>> totaltime=3.006810,
>>> Address=10.1.5.18:80, open=1, res=0, err=0, connecttime=0.002956, 
>>> totaltime=3.007413,
>>> Address=10.1.24.67:80, open=1, res=0, err=0, connecttime=0.002860, 
>>> totaltime=3.007120,
>>>
>>> The problem does not affect the same hosts each time.  The problem 
>>> will show a different number of hosts usually each cycle, sometimes 
>>> on same servers, but often on different ones.
>>>
>>> I've tried the following to see if anything will help:
>>> * Reducing the number of hosts.  If I only have a couple or three in 
>>> the bb-hosts, the problem doesn't manifest.
>>> * Recompiling.  Doesn't help.
>>> * Changing the test URL.  Doesn't help.
>>> * Adding a --concurrency= option to the launch.  If I use a 
>>> concurrency of 1, the problem does not manifest.
>>>
>>> Setting the concurrency to 1 to fix the problem isn't an option, but 
>>> makes me think something is getting really mixed up in the select() 
>>> processing in bbnet.
>>>
>>> Does anyone have any ideas how to diagnose where Hobbit is coming up 
>>> with a 3-second latency, when none of my test tools running off the 
>>> same server can duplicate the same timing?
>>>
>>> Thanks for any ideas, this is really baffling me.
>>> -Alan
>>>
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>
>>>
>>
>>
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>





More information about the Xymon mailing list