[Xymon] Xymon Digest, Vol 111, Issue 17

Mon Apr 27 13:54:23 CEST 2020

>
> ---------- Forwarded message ----------
> From: Adam Goryachev <mailinglists at websitemanagers.com.au>
> On 27/4/20 05:06, Gary Allen Vollink wrote:
>
> Hi all,
>
> I have a configuration which uses RAID meta-devices set up as raid1 over
> empty slots for GUI configuration and notification.  As such, I have md0
> and md1 showing up as fatal errors in Xymon.  Again, this setup is standard
> for this installation.  md2 + are all normal normal, valid (and actually
> hold mounted filesystems).
>
> I'd normally expect to be able to set up analysis.cfg to "something
> something IGNORE" for this machine.  Like:
>
> HOST=vault.home.vollink.com
>     RAID md0 IGNORE
>     RAID md1 IGNORE
>
> Does such a thing exist (and I missed it/have the syntax wrong?)  If not,
> /could/ such a thing exist?
>
> I'm starting to become used to just having a RED screen (and that is
> dangerous).
>
> If the answer to the above is all, 'no,' then what is the best way to
> ignore all RAID for that machine?
>
> Thank you much for any thoughts,
> Gary
>
> You will need to share a your /proc/mdstat and/or a pointer to which ext
> script you are using to monitor your md RAID. I suspect that your RAID
> arrays are defined as a two member RAID1 with one missing member,
> therefore, they would be expected to show as red, because they are failed.
>
> You could either define the RAID arrays as RAID1 with only one member, or
> else define them as RAID0 with only one member.
>
> Or, you could add the spare drives as spares, or simply not define them as
> RAID arrays until you actually need to use them.
>
> Regards,
> Adam
>
Thank you for responding.

I'm going to guess that the answer to my actual question - is there a way
to ignore individual md failures - is "I don't know".  To be clear: "I
don't know" is acceptable, I read through source-code looking for a way,
and I couldn't find one (and so-many bits are auto-loaded that it's super
hard to be sure enough to say "no").  I was hoping someone on-list would
actually know, but I get why that might not be the case.

To the questions:

============================ /proc/mdstat ===========================

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda5[0] sdc5[2] sdb5[1]
      11711382912 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2]
      2097088 blocks [6/3] [UUU___]

md0 : active raid1 sda1[0] sdb1[1] sdc1[2]
      2490176 blocks [6/3] [UUU___]

unused devices: <none>
============================ /proc/mdstat ===========================

I'm using the script here:
http://www.it-eckert.com/blog/2015/agent-less-monitoring-with-xymon/
(xymon-rclient.sh).

Specifically, the platform is Synology and yes, Synology runs two raid1
arrays over all of the slots (even though some are empty).  I could fix
this easily by adding hard drives into the empty slots, but I specifically
bought this unit so that I could expand it later.  That is, I both
understand that this is properly showing broken but unmounted RAIDs and I
know why those RAIDs are broken (and thus why the errors are nominal in my
setup).

I am still hoping that a failure state that is nominal would be something
I'd be able to ignore (just as I can ignore specific libraries or
individual filesystems).

The other choice for me is to entirely remove the mdstat portion of Ekert's
script.  (Sadly, there is nothing else for Synology monitoring that I can
get to work at all, and that simple script otherwise covers all of what I
need).  This means, I won't be notified (through Xymon) if one of my drives
does fail, but it's better than getting used to ignoring a RED background.

[Archive readers: It is okay to contact me directly with questions about my
setup]

Thank you,
Gary Allen Vollink
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20200427/3326f492/attachment.htm>