[Xymon] DNS failures causing "runtime longer than time limit"

Jeremy Laidman jlaidman at rebel-it.com.au
Wed Jun 3 05:08:29 CEST 2015


Hi

I'm running Xymon v4.3.10 on Linux, and I'm quite sure it's compiled with
c-ares support.

I have 12 new DNS servers that were added to Xymon about one month ago.
All of my server entries in hosts.cfg have "testip".  The tasks.cfg runs
xymonet with "--dns-timeout=3".  The hosts entries look like so:

10.10.10.1 dnshost1.example.com    # testip dns=NS:example.com,SOA:
example.com

About a week ago, connectivity to all of these servers failed, and at the
same time, the xymonnet run time jumped from less than 15 seconds to about
330 seconds, so about 315 seconds extra.  The xymonnet page says 295
seconds is taken up by DNS tests.

If the increase in time taken is about 315 and is entirely due to the 12
servers failing, then each failed server is adding about 26 seconds to the
total run time.

I don't think this should be happening like this.  With two DNS checks per
server, the DNS checks should be taking 6 seconds each to time-out, not
26.  If I run xymonnet with "--timing --no-update" and specify only one
hostname, I can view the results and the timing.  This shows that the ping
check gets reported after about 3 seconds, and then the DNS tests are
executed and take 26 seconds total.

My naiive assumption was that when a server failed a ping (and didn't have
"noclear" defined in hosts.cfg) then the network checks would be skipped.
On re-reading the man page for hosts.cfg, it dawned on me that a failed
ping simply suppresses failed test /results/, but doesn't stop the tests
from being run.

So the real problem is that the "--dns-timeout=3" is not being taken into
consideration by xymonnet.  If I run xymonnet with "--debug" it tells me:

1900 2015-06-03 12:02:20 ares_search: tlookup='example.com', class=1, type=2
1900 2015-06-03 12:02:20 ares_search: tlookup='example.com', class=1, type=6
1900 2015-06-03 12:02:20 Processing 0 DNS lookups with ARES
1900 2015-06-03 12:02:46 Finished ARES queue after loop 423

This is peculiar.  Why would it say "processing 0 DNS lookups" when there
are two lookups to test?  Could this be because xymonnet hasn't actually
been built with ARES support and I didn't know it?  Is there a good way to
tell?  If I add "--no-ares" I get the same results perhaps suggesting a
lack of ARES support.  On the other hand, if I add "timeout:3" and
"attempts:1" into resolv.conf, I also get the same results.  If I run "nm
/path/to/xymonnet | grep gethostby" it returns "ares_gethostbyname".

Just for fun, I compiled Xymon v4.3.21 and ran the xymonnet binary from
there, with no change in behaviour.  I also tried removing the
"--dns-timeout" option so that it defaults to 30 seconds, but still no
change - 26 seconds for two DNS tests.

So, I'm not really sure what the problem is, but xymonnet certainly isn't
behaving as I would expect.

Cheers
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150603/092181f1/attachment.html>


More information about the Xymon mailing list