[Xymon] Phantom red statuses (Fwd: Xymon [750466] mgmtconsole:msgs CRITICAL (RED))

J.C. Cleaver cleaver at terabithia.org
Sat Feb 13 02:30:59 CET 2016



On Fri, February 12, 2016 3:20 pm, Greg Earle wrote:
>
>> On Feb 12, 2016, at 3:00 AM, xymon-request at xymon.com wrote:
>>
>> Is there any chance you have multiple servers reporting in with the name
>> 'mgmtconsole'?  Especially if you're not using FQDN (which it doesn't
>> seem
>> like you are), that seems like something that might cause this: Two
>> different servers with the same name, each sending their own red/green
>> states every few minutes.
>
> Thanks JC.  Interesting theory: you see, the old syslog server was on a
> separate machine and it never had this Flapping status issue, despite it
> getting all of the same syslog messages.  "mgmtconsole" is my Xymon server
> so I thought "Oh maybe because it's acting as both server and client
> that's
> what's screwing it up", but before this I never got any anomalous alerts
> from "mgmtconsole" itself, just the expected reds/yellows from any
> (ab)normal condition that triggered them.
>
> So no, there aren't two different servers (and yes, I don't use FQHN's).
>
> The perma-red state is back, btw.  So, given that the old syslog server
> wasn't in perma-red/Flapping state, why would the new syslog/Xymon combo
> server be in it?  Also, given that the actual contents of
> "/var/log/messages"
> aren't causing red alerts, is the red alert state caused solely by it
> flapping between yellow and green?  (I still don't get why the old machine
> wasn't in a similar perma-red/Flapping state.)
>
> Where should I look to try and cure this?
>

Hmm.

The first step will be to track down what status messages xymond is
receiving, precisely.

Something like the following:
xymoncmd xymond_channel --channel=status xymond_capture
--hosts=mgmtconsole --tests=msgs

(or xymoncmd xymond_channel --channel=status --filter='|msgs|' cat >
/tmp/foo, etc...)

should spit out each incoming 'msgs' status message for the host. By
default, I believe it would need to be changing colors 3x within 300s to
be triggered. Once you have that, compare the various colors coming in to
try to find a distinguishing pattern. This still seems like something most
likely caused by varying client sources being falsely reported as the same
host, so keep an eye on the IPs and so forth.

Additionally, check this server itself for anything that might cause
duplication... More than one copy of runclient.sh and/or xymonlaunch
executing, or perhaps permissions/updating/corruption problems on the
logfetch ".status" file it's saving in $XYMONTMP.

Finally, check the raw client data message as well for anything unusual
around the [msgs:/some/path/here] sections. It's unlikely, but possible
there's a parsing bug around there that might be confusing xymond_client
into generating multiple or erroneous 'msgs' test entries.


HTH,
-jc





More information about the Xymon mailing list