Continuing to try to debug this problem, have tried about everything
I can to resolve the issues with http probes. Including:
* Complete rebuild of the server with CentOS 4.6 and recompile of
Hobbit. Same issues.
* Removing everything from /etc/sysctl.conf, rebooting. Same issues.
* Manipulating the httpd.conf configs on remote servers, forcing
HTTP/1.0, removing ETags, creating a very simple index page to test
against. Same issues.
* Upgrading Apache on sample remote server to Apache 2.2.8 (most are
2.2.4). Same issue.
* Recompiling Hobbit with debugging flags, to make sure the optimizer
is not applied. Same issue.
The only two servers I have that seem to work consistently well are a
pair of Apache 2.0.52 servers. The 2.2.4+ servers all seem to give
Hobbit issues. Although, again, repeated curl or wget probe cycles
against the servers from the Hobbit server never show more than a
0.2s response time.
But, Hobbit continues to report things like:
http://10.1.17.251/ - OK
HTTP/1.1 200 OK
Date: Mon, 16 Jun 2008 00:54:03 GMT
Server: Apache/2.2.8 (EL)
Last-Modified: Sun, 15 Jun 2008 00:37:00 GMT
ETag: "7c809f-9b-44fa9b5806300"
Accept-Ranges: bytes
Content-Length: 155
Connection: close
Content-Type: text/html; charset=UTF-8
I can't come up with anything other than Hobbit as a cause. But is
there anything I can do to trace what is happening internally to get
past this problem? Any ideas at all would really be appreciated.
Thanks in advance.
-Alan
Seconds: 3.00
Alan Sparks wrote:
Have a new install of Hobbit (4.2, tried 4.3 snap as well) on a
fresh install of CentOS 4.6 x86_64, up to date on patches. I have a
problem with HTTP tests on "random" web servers that I just can't
figure out.
I have about 64 of my hosts in the bb-hosts on this server, and have
http tests defined for these servers. On most of these servers,
Hobbit is reporting the "Seconds:" for the response at 3 seconds.
It seems that it is inconsistent -- one cycle to the next, the
3-second response may move to a different set of servers.
The http: tests are defined using the IP address of the server - no
server name (so no DNS lookup).
I've run a loop of tests on the same URL using wget and with curl,
and used my browser and Telnet to connect to the same URL. I
consistently get a response time of about 0.2 seconds maximum from
the servers.
The bbnet entry in hobbitlaunch.cfg looks like:
CMD bbtest-net --report --ping --checkresponse --debug
With the debugging turned on, I see the following entries
periodically in the network test log:
Address=10.1.5.17:80, open=1, res=0, err=0, connecttime=0.002965,
totaltime=3.006810,
Address=10.1.5.18:80, open=1, res=0, err=0, connecttime=0.002956,
totaltime=3.007413,
Address=10.1.24.67:80, open=1, res=0, err=0, connecttime=0.002860,
totaltime=3.007120,
The problem does not affect the same hosts each time. The problem
will show a different number of hosts usually each cycle, sometimes
on same servers, but often on different ones.
I've tried the following to see if anything will help:
* Reducing the number of hosts. If I only have a couple or three in
the bb-hosts, the problem doesn't manifest.
* Recompiling. Doesn't help.
* Changing the test URL. Doesn't help.
* Adding a --concurrency= option to the launch. If I use a
concurrency of 1, the problem does not manifest.
Setting the concurrency to 1 to fix the problem isn't an option, but
makes me think something is getting really mixed up in the select()
processing in bbnet.
Does anyone have any ideas how to diagnose where Hobbit is coming up
with a 3-second latency, when none of my test tools running off the
same server can duplicate the same timing?
Thanks for any ideas, this is really baffling me.
-Alan
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk