[Xymon] Purple storm
Henrik Størner
henrik at hswn.dk
Tue Mar 20 08:10:45 CET 2012
On 19-03-2012 19:15, Poppy, Ben wrote:
> I have an interesting problem that happened last night. We are working
> on a DR test. Part of that test includes shutting down some DC’s in our
> DR datacenter. When that happened, most tests that are initiated from
> the xymon servers (http, dns, ssh, ftp, etc) to the monitored server
> went purple. The servers that went purple were not all in our DR
> datacenter, it was at all of our sites, and even included some tests to
> the xymon server itself (we monitor the HTTP web page of xymon itself as
> well).
>
> Both of our xymon servers point to 2 windows DC’s in our production
> datacenter in /etc/resolv.conf for DNS lookups.
Check the "xymonnet" status history. I suppose this status will show
some yellow events during this, caused by the network tests taking too
long to run.
The status will tell you more about what part of the network tests are
taking too long.
This should also show up in the xymonnet.log file.
One likely culprit would be if you are doing "ntp" tests or custom DNS
queries from Xymon against the DC's that are down. "ntp" tests use an
external program (ntpdate) to perform the query, and it has a very long
timeout when servers are not responding. DNS queries use the C-ARES
library, and because I misunderstood how the timeout handling works in
this library it can several minutes *per test* to timeout.
Fixes for both of these issues are "in the pipeline" for the next major
Xymon version.
Regards,
Henrik
More information about the Xymon
mailing list