[Xymon] bbtest-net false conn alerts
Hamilton, Ross
Ross.Hamilton at brevanhoward.com
Fri Feb 28 12:46:38 CET 2014
Hello
I'm investigating why bbtest-net runs are taking what I think is a long time on an xymon server. The bbtest-net status displayed on the webpage looks like so.
Statistics:
Hosts total : 2832
Hosts with no tests : 966
Total test count : 2730
Status messages : 2954
Alert status msgs : 0
Transmissions : 30
DNS statistics:
# hostnames resolved : 1108
# succesful : 2004
# failed : 38
# calls to dnsresolve : 1116
TCP test statistics:
# TCP tests total : 773
# HTTP tests : 327
# Simple TCP tests : 446
# Connection attempts : 772
# bytes written : 48486
# bytes read : 3283786
Error output: [Edited for brevity]
27 lines of this nature ...
Host foo appears twice in bb-hosts! This may cause strange results
dnsresolve - internal error, name 'ldnpgec01v' not in cache
17 lines of this nature ...
bbtest-net: Cannot resolve IP for host bar
TIME SPENT
Event Starttime Duration
bbtest-net startup 9601848.473961 -
Service definitions loaded 9601848.481606 0.007644
Tests loaded, hostname lookups done 9601860.506724 12.025118
Test engine setup completed 9601860.519996 0.013271
TCP tests completed 9601875.790148 15.270152
PING test completed (1896 hosts) 9601945.698899 69.908750
PING test results sent 9601950.407327 4.708427
Test result collection completed 9601950.407684 0.000357
LDAP test engine setup completed 9601950.407685 0.000001
LDAP tests executed 9601950.407686 0.000001
LDAP tests result collection completed 9601950.407687 0.000000
DNS tests executed 9601951.952093 1.544405
NTP tests executed 9601963.812322 11.860229
Test results transmitted 9601963.881021 0.068698
bbtest-net completed 9601964.053873 0.172851
TIME TOTAL 115.579911
The lines in error output are the result of configuration put in by another team. I can correct it by changing some scripts, but I don't think that is causing the problem I'm looking into.
The bit I'm interested in is "PING test completed (1896 hosts)" and why it takes 70 seconds.
bbtest-net is set to run in hobbitlaunch.cfg like so.
CMD bbtest-net --report --ping --checkresponse --concurrency=512 --no-ares
If I run it manually from the command line like so, the ping tests complete in less than a millisecond.
/bbtest-net --no-ares --report --ping --checkresponse --concurrency=512 --no-update -debug
PING test completed (1896 hosts) 9600209.001178 0.000537
The reason I'm investigating is we often get a flurry of false connectivity alerts from xymon when the time taken to run bbtest-net spikes for some reason.
There is another (contingency) host I have that I am doing some tests on. When I run bbtest-net manually on this host, the ping tests take about 40s.
On the contingency host, in the --debug output there isn't an explanation for the time taken, the timestamps go straight from "TCP tests completed" to 40s later.
2014-02-28 11:05:51 TCP tests completed normally
2014-02-28 11:06:33 More than one ping result for 192.168.180.94
I have many of these "more than one ping result for <IP>" could they be contributing? I was assuming not.
Does anyone have any pointers for things I could look at or test?
The bb-network.log doesn't have anything other than the error output previously mentioned.
On an unrelated note, a big thanks to everyone for participating on these lists (particularly Henrik for supporting and providing a great product). It has been a great help to me over the years.
Regards,
Ross
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20140228/2bd2149c/attachment.html>
More information about the Xymon
mailing list