[Xymon] False red alerts

J.C. Cleaver cleaver at terabithia.org
Thu Apr 14 18:56:17 CEST 2016


On Thu, April 14, 2016 7:36 am, Matt Pannucci wrote:
> Hello,
>
> For the past two days, our xymon environment has been falsely reporting
> red
> for SSH and HTTP.  It does seem to be random(happened one day at 10pm and
> two days later at 2:30am).
>
> It happens to every server all at the same time.  Then a couple minutes
> later everything goes back to green.
>
> I've checked through some logs with no success.  I'm not exactly sure if
> I'm looking in the correct places to find the answer.
>
> Any help/suggestions would be great!
>
> Thanks
>
> Matt
> __________________
>

Hi Matt,

You'll want to start by looking at the history ("histlog") snapshots from
the red statuses and see what xymonnet (the process doing the http and ssh
network testing) reported when it happened. Was it a timeout? DNS error?
Premature TCP closure?

If you've never received any "conn" test failures (which are ICMP pings
usually done by fping) at the same time, then it's probably not a general
loss of connectivity, but there could be an issue at the TCP layer (packet
loss leading to closure timeouts, firewall port limits, etc).

If TCP seems fine, then you'd want to look more closely at the server
xymon is running on. Check the xymonnet logs (/var/log/xymon/xymonnet.log)
for any errors around this time, see if the server itself was having
performance problems, etc. You can try increasing test concurrency, or
changing how DNS lookups are done, but that should only be done if the
issue's been narrowed down.

The overarching question should also be: When did the problem start, and
did anything change around that time?


HTH,
-jc




More information about the Xymon mailing list