[Xymon] Xymon server post-migration blues pt. 2 - "phantom" reports apparently coming from 127.0.0.1

Root, Paul T Paul.Root at CenturyLink.com
Wed Apr 24 15:26:16 CEST 2019


And did you upgrade your clients, or are they all still running 4.3.12?

-----Original Message-----
From: Root, Paul T
Sent: Wednesday, April 24, 2019 8:24 AM
To: xymon at xymon.com
Subject: RE: [Xymon] Xymon server post-migration blues pt. 2 - "phantom" reports apparently coming from 127.0.0.1

What are your MAXMSG_* settings?

-----Original Message-----
From: Xymon <xymon-bounces at xymon.com> On Behalf Of Greg Earle
Sent: Tuesday, April 23, 2019 11:54 PM
To: xymon at xymon.com
Subject: [Xymon] Xymon server post-migration blues pt. 2 - "phantom" reports apparently coming from 127.0.0.1

Another thing that happened after my recent migration/Xymon upgrade is
that I started getting phantom "disk" alerts purporting to be from the
Xymon server itself.

They looked like this:

--
To: sysadmins at my.do.main
Subject: Xymon [556665157] mgmt:disk CRITICAL (RED)
Message-Id: <20190318195008.53EF4635153 at mgmt.my.do.main>
 From: xymon at mgmt.my.do.main (xymon Monitor (client))

red Mon Mar 18 12:50:04 PDT 2019 - Filesystems NOT ok
&red /export/bkd05d (98% used) has reached the PANIC level (98%)

[...]
--

The thing is, the partition "/export/bkd05d" does not exist on the Xymon
server host "mgmt".

It exists on a completely different system (and I know exactly which one
it is).  I've seen other alerts like this where the disk partitions
mentioned are from other systems, too.

In short, the Xymon server is getting the reports from the clients but
somehow they are getting mangled into looking like they are coming from
127.0.0.1 instead and thus are local to itself, and so it generates red
alerts from itself as a result.

In many cases they are filesystems where I already had exception clauses
in "analysis.cfg" for them already, so I never get alarms from the
actual client host.  So to suddenly get "back from the dead" red alarms
for them was a surprise, to say the least.

I've kludged around it by making a special pseudo-clause in
"analysis.cfg" for the Xymon server for all of these disk partition
exceptions:

--
# XXX - KLUDGE dummy entries to prevent Xymon from reporting false
# XXX - "red" disk alerts for systems with faulty "127.0.0.1" reports
HOST=mgmt
         DISK    /export/brick1 101 101
         DISK    /export/data 101 101
         DISK    /export/data1 100 100
         DISK    /export/data2 99 100
         DISK    /export/work 99 100
         DISK    %(?-i)^.*/Volumes/Time 101 101
         DISK    /media/Oracle_Solaris-11_3-Text-SPARC 101 101
         DISK    /media/Solaris-11_3_28_4_0-Boot-SPARC 101 101
                [... more here ...]
--

but obviously I would prefer to solve the problem so I can remove this.

What changed between Xymon 4.3.12 and 4.3.28 to cause this?  How do I
debug it?

                - Greg
_______________________________________________
Xymon mailing list
Xymon at xymon.com
http://lists.xymon.com/mailman/listinfo/xymon
This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.


More information about the Xymon mailing list