[Xymon] bbtest-net false conn alerts

Hamilton, Ross Ross.Hamilton at brevanhoward.com
Fri Feb 28 12:46:38 CET 2014


Hello

I'm investigating why bbtest-net runs are taking what I think is a long time on an xymon server.    The bbtest-net status displayed on the webpage looks like so.
Statistics:
Hosts total           :     2832
Hosts with no tests   :      966
Total test count      :     2730
Status messages       :     2954
Alert status msgs     :        0
Transmissions         :       30

DNS statistics:
# hostnames resolved  :     1108
# succesful           :     2004
# failed              :       38
# calls to dnsresolve :     1116

TCP test statistics:
# TCP tests total     :      773
# HTTP tests          :      327
# Simple TCP tests    :      446
# Connection attempts :      772
# bytes written       :    48486
# bytes read          :  3283786


Error output: [Edited for brevity]
27 lines of this nature ...
Host foo appears twice in bb-hosts! This may cause strange results

dnsresolve - internal error, name 'ldnpgec01v' not in cache

17 lines of this nature ...
bbtest-net: Cannot resolve IP for host bar


TIME SPENT
Event                                            Starttime          Duration
bbtest-net startup                          9601848.473961                 -
Service definitions loaded                  9601848.481606          0.007644
Tests loaded, hostname lookups done         9601860.506724         12.025118
Test engine setup completed                 9601860.519996          0.013271
TCP tests completed                         9601875.790148         15.270152
PING test completed (1896 hosts)            9601945.698899         69.908750
PING test results sent                      9601950.407327          4.708427
Test result collection completed            9601950.407684          0.000357
LDAP test engine setup completed            9601950.407685          0.000001
LDAP tests executed                         9601950.407686          0.000001
LDAP tests result collection completed      9601950.407687          0.000000
DNS tests executed                          9601951.952093          1.544405
NTP tests executed                          9601963.812322         11.860229
Test results transmitted                    9601963.881021          0.068698
bbtest-net completed                        9601964.053873          0.172851
TIME TOTAL                                                        115.579911

The lines in error output are the result of configuration put in by another team.  I can correct it by changing some scripts, but I don't think that is causing the problem I'm looking into.

The bit I'm interested in is  "PING test completed (1896 hosts)" and why it takes 70 seconds.

bbtest-net is set to run in hobbitlaunch.cfg like so.
CMD bbtest-net --report --ping --checkresponse --concurrency=512 --no-ares

If I run it manually from the command line like so, the ping tests complete in less than a millisecond.
/bbtest-net --no-ares --report --ping --checkresponse --concurrency=512 --no-update -debug
PING test completed (1896 hosts)            9600209.001178          0.000537

The reason I'm investigating is we often get a flurry of false connectivity alerts from xymon when the time taken to run bbtest-net spikes for some reason.

There is another (contingency) host I have that I am doing some tests on.  When I run bbtest-net manually on this host, the ping tests take about 40s.
On the contingency host, in the --debug output there isn't an explanation for the time taken, the timestamps go straight from "TCP tests completed" to 40s later.
2014-02-28 11:05:51 TCP tests completed normally
2014-02-28 11:06:33 More than one ping result for 192.168.180.94
I have many of these "more than one ping result for <IP>" could they be contributing?  I was assuming not.

Does anyone have any pointers for things I could look at or test?
The bb-network.log doesn't have anything other than the error output previously mentioned.

On an unrelated note, a big thanks to everyone for participating on these lists (particularly Henrik for supporting and providing a great product).  It has been a great help to me over the years.

Regards,
Ross
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20140228/2bd2149c/attachment.html>


More information about the Xymon mailing list