[Xymon] purple problems

Walter Rutherford wlrutherford at alaska.edu
Mon Aug 31 22:58:45 CEST 2015


Found it!

Besides the "raid.sh" script in ext/ I needed a raid configuration in
etc/client.d/. I thought that was defined in another file but apparently
not.

On Mon, Aug 31, 2015 at 10:53 AM, Walter Rutherford <wlrutherford at alaska.edu
> wrote:

> All good questions. Hunting for the answers helped me to see some patterns
> I'd missed before.
>
> The xymon server hostname and IP seem to be consistent, but that's about
> all that is consistent.
> There is a separate column for 'disks' on the main webpage and it
> correctly shows the output from
> a 'df' command. The script running on the clients' sides is called
> "raid.sh", the comments at the top
> of the script indicate it is over a decade old; bb-mdstat.h based on
> bb-raid.sh. There's a link from
> /home/xymon-client/ext to /usr/share/xymon-client/ext on most systems. The
> directory and the
> scripts in it are owned by either root or xymon. Changing location,
> ownership, and perms to match
> one of the working systems hasn't helped.
>
> The broken raid reports are all from Linux boxes. The working reports look
> like this:
>
> *          Mon Aug 31 09:38:49 AKDT 2015 RAID ALL devices OK*
>
>
> *             green md0 Status OK*
> *             green md1 Status OK*
> *             green md2 Status OK*
>
> *          ============================ /proc/mdstat
> ===========================*
>
> *          Personalities : [raid1] *
> *          md0 : active raid1 sdc1[1] sda1[0]*
> *                511988 blocks super 1.0 [2/2] [UU]*
>
> *          md2 : active raid1 sdd[3] sdb[2]*
> *                536869888 blocks super 1.2 [2/2] [UU]*
>
> *          md1 : active raid1 sdc2[1] sda2[2]*
> *                41428924 blocks super 1.1 [2/2] [UU]*
> *                bitmap: 1/1 pages [4KB], 65536KB chunk*
>
> *          unused devices: *
>
> *          Run /sbin/mdadm -D /dev/md* for more info*
>
> The non-working systems either show nothing at all (that's better than
> purple) OR show the same
> three green md[0-2] devices (whether it has three raid devices or not) on
> a blue disabled background.
> So, I'm almost positive someone copied a working system incorrectly to
> other clients without cleaning
> up the foreign logs. The working systems overwrote or just aged out the
> incorrect information while the
> non-working ones just keep reporting it. I have found logs but none for
> this raid information. Perhaps the
> logs are compressed or otherwise rendered humanly unreadable.
>
> So, I copied the /usr/share/xymon-client/ext scripts from a working system
> to several that were reporting
> nothing and restarted xymon-client. Most did nothing, one is showing a "no
> data" indicator. The raid out-
> put looks normal except the device is md127 - perhaps the high number is
> confusing the script.  But the
> wbinfo.sh script I copied at the same time to/from the same directory is
> now showing green. Argh!
>
> I don't even know where the xymon-client scripts running here came from so
> I'm reluctant (but motivated)
> to just rip them all out by the roots and start over from a known baseline.
>
>   WLR
>
>
>
> ==================================================================================
>
> Phil Crooker <Phil.Crooker at orix.com.au>
> 3:57 PM (17 hours ago)
>
> Is the hostname wrong somewhere? I'm thinking maybe the scipt is sending
> the wrong hostname, somehow....
>
>
>
> ==================================================================================
>
>
> Jeremy Laidman <jlaidman at rebel-it.com.au>
>
> 7:07 PM (14 hours ago)
>
>
> On 30 August 2015 at 14:22, Walter Rutherford <wlrutherford at alaska.edu>
> wrote:
> This is probably an old issue but I didn't see a way to search the
> archives.
>
> https://www.google.com/?q=site:lists.xymon.com+purple+raid
>
> Our xymon server is showing purple indicators for two of our custom scripts
> but only on a handful of systems.
>
> The scripts are running client-side and/or server-side?  Can you describe
> how the scripts work?  Are they locally-written scripts or did you get them
> from somewhere online?
>
> RAID checks are not standard for most Xymon clients.  I've never used or
> seen RAID checks.  A quick look at the source code indicates built-in
> support for only Linux, where "md" devices are identified in /proc/mdstat.
>
> At the bottom of the incorrect raid report page there is a
> link to "client data". If I follow the link I get a full report including
> the correct,
> current raid information!
>
> How is the RAID information getting into the client data?  This might not
> be used by your custom scripts, and so might be a red herring.  More detail
> is required about the raid scripts.  Or whether you're using the built-in
> support for Linux RAID meta-devices reporting with client data in the
> [mdstat] section.  If the latter, perhaps you could show the [mdstat]
> section of client data?
>
> Cheers
>
>
> ====================================================================================
>
>
> ---------- Forwarded message ----------
> From: Walter Rutherford <wlrutherford at alaska.edu>
> Date: Sat, Aug 29, 2015 at 8:22 PM
> Subject: purple problems
> To: Xymon at xymon.com
>
>
> Hey all,
>
> This is probably an old issue but I didn't see a way to search the
> archives.
>
> Our xymon server is showing purple indicators for two of our custom scripts
> but only on a handful of systems. I've found differences in file location,
> file
> ownership, UID, GID, etc.. but so far none of that seems to be the problem.
>
> The custom script checks raids. Strangely, all of the stagnant hosts show
> the same three disks entries from mid-July no matter how many disks they
> really have. Unfortunately I don't know what may've happened in July; that
> was before I started working here. I suspect the xymon-client software was
> copied from a live system, including the old status reports, but in so
> doing
> something wasn't re-configured correctly for the new systems.
>
> Even stranger, at my urging the Lead SA undisabled the purple
> notifications.
> I was expecting the page to go purple but it remains green even though the
> page isn't updating. At the bottom of the incorrect raid report page there
> is a
> link to "client data". If I follow the link I get a full report *including
> the correct,*
> *current raid information*!
>
> I think this means that the client is capturing the correct data and
> sending
> it to the server, the server is actually receiving the report, but after
> that the
> raid report isn't being handled correctly. Other systems display as
> expected.
> So far I haven't found anywhere on the server that  the purple systems are
> configured or handled differently.
>
> I doubt we're the first to experience this problem. Does this sound
> familiar?
>
> Thanks in advance for any hints you can provide for where to look next.
>
>    WLR
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150831/572ea7c5/attachment.html>


More information about the Xymon mailing list