<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi,</p>
<p>I remember looking into this a long time ago, and the
--dnstimeout setting does not quite work as expected - because
C-ARES does not quite work as expected.</p>
<p>C-ARES has some timeout settings for queries, but it performs an
exponential back-off between queries, so it is impossible to
really hit the exact timeout you specify in --dnstimeout.</p>
<p>In fact, current 4.3.x versions have a hard-coded setting for the
C-ARES timeouts - it starts with a 2 second timeout and performs 4
attempts, which ends up with approximately 23 second timeout for
all DNS queries. This is in xymonnet/dns.c (look for "ARES
timeout"). If you need those really short timeouts, then that is
probably what you should change.<br>
</p>
<p><br>
</p>
<p>Regards,<br>
Henrik<br>
</p>
<br>
<div class="moz-cite-prefix">On 11-09-2017 05:52, Jeremy Laidman
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACO=ejzNx8h95yL6PozO4vDW508t=QOzRFF87J5AkcoW6tdC3w@mail.gmail.com">
<div dir="ltr">Hi
<div><br>
</div>
<div>I'm reviving an old thread, because this is biting me
again, so I wanted to know if anyone had any fresh ideas on
this problem.</div>
<div><br>
</div>
<div>Many of the servers I monitor are DNS servers, so the
C-ARES library has a lot of queries to perform every 5
minutes. In some cases, I want to ensure that a DNS service is
down (and alert when not) so most of the time I can expect a
timeout, leading to a long poll cycle. I'd really like to be
able to drop the timeout to significantly less than the 23
seconds it's taking now per server.</div>
<div><br>
</div>
<div>Cheers</div>
<div>Jeremy</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On 3 June 2015 at 13:49, Jeremy Laidman
<span dir="ltr"><<a href="mailto:jlaidman@rebel-it.com.au"
target="_blank" moz-do-not-send="true">jlaidman@rebel-it.com.au</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>OK, I'm a bit puzzled by this, and definitely pushing
the envelope of my debugging and C coding skills. The
relevant code from xymonnet/dns.c is:<br>
<br>
168 tv.tv_sec = dnstimeout;
tv.tv_usec = 0;<br>
169 tvp = ares_timeout(channel,
&tv, &tv);<br>
<br>
</div>
I ran this through gdb, with "--dns-timeout=3" specified,
setting a breakpoint at line 168. I confirmed that
dnstimeout is set to 3. When I step one line, I should
see tv.tv_sec set to 3 also, but it's set to 0.<br>
<div><br>
</div>
<div>If I don't specify --dns-timeout at all, printing
dnstimeout shows "30". Again, after stepping to the
next line, tv.tv_sec is still zero.<br>
<br>
Breakpoint 1, dns_ares_queue_run (channel=0x58b1c0) at
dns.c:168<br>
168 tv.tv_sec = dnstimeout;
tv.tv_usec = 0;<br>
(gdb) p dnstimeout<br>
$14 = 30<br>
(gdb) n<br>
169 tvp = ares_timeout(channel,
&tv, &tv);<br>
(gdb) p tv<br>
$15 = {tv_sec = 0, tv_usec = 0}<br>
(gdb)<br>
<br>
</div>
<div>So what gives here?<span class="HOEnZb"><font
color="#888888"><br>
<br>
</font></span></div>
<span class="HOEnZb"><font color="#888888">
<div>J<br>
<br>
</div>
</font></span></div>
<div class="HOEnZb">
<div class="h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On 3 June 2015 at 13:08,
Jeremy Laidman <span dir="ltr"><<a
href="mailto:jlaidman@rebel-it.com.au"
target="_blank" moz-do-not-send="true">jlaidman@rebel-it.com.au</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hi<br>
<br>
</div>
<div>I'm running Xymon v4.3.10
on Linux, and I'm quite sure
it's compiled with c-ares
support.<br>
<br>
</div>
I have 12 new DNS servers that
were added to Xymon about one
month ago. All of my server
entries in hosts.cfg have
"testip". The tasks.cfg runs
xymonet with "--dns-timeout=3".
The hosts entries look like so:<br>
<br>
10.10.10.1 <a
href="http://dnshost1.example.com"
target="_blank"
moz-do-not-send="true">dnshost1.example.com</a>
# testip dns=NS:<a
href="http://example.com"
target="_blank"
moz-do-not-send="true">example.com</a>,SOA:<a
href="http://example.com"
target="_blank"
moz-do-not-send="true">example<wbr>.com</a><br>
<br>
About a week ago, connectivity
to all of these servers failed,
and at the same time, the
xymonnet run time jumped from
less than 15 seconds to about
330 seconds, so about 315
seconds extra. The xymonnet
page says 295 seconds is taken
up by DNS tests.<br>
<br>
</div>
If the increase in time taken is
about 315 and is entirely due to
the 12 servers failing, then each
failed server is adding about 26
seconds to the total run time.<br>
<br>
</div>
I don't think this should be
happening like this. With two DNS
checks per server, the DNS checks
should be taking 6 seconds each to
time-out, not 26. If I run xymonnet
with "--timing --no-update" and
specify only one hostname, I can
view the results and the timing.
This shows that the ping check gets
reported after about 3 seconds, and
then the DNS tests are executed and
take 26 seconds total.<br>
<br>
</div>
My naiive assumption was that when a
server failed a ping (and didn't have
"noclear" defined in hosts.cfg) then
the network checks would be skipped.
On re-reading the man page for
hosts.cfg, it dawned on me that a
failed ping simply suppresses failed
test /results/, but doesn't stop the
tests from being run.<br>
<br>
</div>
So the real problem is that the
"--dns-timeout=3" is not being taken
into consideration by xymonnet. If I
run xymonnet with "--debug" it tells me:<br>
<br>
1900 2015-06-03 12:02:20 ares_search:
tlookup='<a href="http://example.com"
target="_blank" moz-do-not-send="true">example.com</a>',
class=1, type=2<br>
1900 2015-06-03 12:02:20 ares_search:
tlookup='<a href="http://example.com"
target="_blank" moz-do-not-send="true">example.com</a>',
class=1, type=6<br>
1900 2015-06-03 12:02:20 Processing 0
DNS lookups with ARES<br>
1900 2015-06-03 12:02:46 Finished ARES
queue after loop 423<br>
<br>
</div>
This is peculiar. Why would it say
"processing 0 DNS lookups" when there are
two lookups to test? Could this be
because xymonnet hasn't actually been
built with ARES support and I didn't know
it? Is there a good way to tell? If I
add "--no-ares" I get the same results
perhaps suggesting a lack of ARES
support. On the other hand, if I add
"timeout:3" and "attempts:1" into
resolv.conf, I also get the same results.
If I run "nm /path/to/xymonnet | grep
gethostby" it returns
"ares_gethostbyname".<br>
<br>
</div>
<div>Just for fun, I compiled Xymon v4.3.21
and ran the xymonnet binary from there,
with no change in behaviour. I also tried
removing the "--dns-timeout" option so
that it defaults to 30 seconds, but still
no change - 26 seconds for two DNS tests.<br>
</div>
<div><br>
</div>
So, I'm not really sure what the problem is,
but xymonnet certainly isn't behaving as I
would expect.<br>
<br>
</div>
Cheers<span
class="m_-4817356621273579713HOEnZb"><font
color="#888888"><br>
</font></span></div>
<span class="m_-4817356621273579713HOEnZb"><font
color="#888888">Jeremy<br>
<br>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Xymon mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Xymon@xymon.com">Xymon@xymon.com</a>
<a class="moz-txt-link-freetext" href="http://lists.xymon.com/mailman/listinfo/xymon">http://lists.xymon.com/mailman/listinfo/xymon</a>
</pre>
</blockquote>
<br>
</body>
</html>