[Xymon] Xymon Digest, Vol 111, Issue 17

Tue Apr 28 04:13:38 CEST 2020

On 27/4/20 21:54, Gary Allen Vollink wrote:
>
>     ---------- Forwarded message ----------
>     From: Adam Goryachev <mailinglists at websitemanagers.com.au
>     <mailto:mailinglists at websitemanagers.com.au>>
>     On 27/4/20 05:06, Gary Allen Vollink wrote:
>>     Hi all,
>>
>>     I have a configuration which uses RAID meta-devices set up as
>>     raid1 over empty slots for GUI configuration and notification. 
>>     As such, I have md0 and md1 showing up as fatal errors in Xymon.
>>     Again, this setup is standard for this installation.  md2 + are
>>     all normal normal, valid (and actually hold mounted filesystems).
>>
>>     I'd normally expect to be able to set up analysis.cfg to
>>     "something something IGNORE" for this machine.  Like:
>>
>>     HOST=vault.home.vollink.com <http://vault.home.vollink.com>
>>         RAID md0 IGNORE
>>         RAID md1 IGNORE
>>
>>     Does such a thing exist (and I missed it/have the syntax wrong?) 
>>     If not, /could/ such a thing exist?
>>
>>     I'm starting to become used to just having a RED screen (and that
>>     is dangerous).
>>
>>     If the answer to the above is all, 'no,' then what is the best
>>     way to ignore all RAID for that machine?
>>
>>     Thank you much for any thoughts,
>>     Gary
>>
>     You will need to share a your /proc/mdstat and/or a pointer to
>     which ext script you are using to monitor your md RAID. I suspect
>     that your RAID arrays are defined as a two member RAID1 with one
>     missing member, therefore, they would be expected to show as red,
>     because they are failed.
>
>     You could either define the RAID arrays as RAID1 with only one
>     member, or else define them as RAID0 with only one member.
>
>     Or, you could add the spare drives as spares, or simply not define
>     them as RAID arrays until you actually need to use them.
>
>     Regards,
>     Adam
>
> Thank you for responding.
>
> I'm going to guess that the answer to my actual question - is there a 
> way to ignore individual md failures - is "I don't know".  To be 
> clear: "I don't know" is acceptable, I read through source-code 
> looking for a way, and I couldn't find one (and so-many bits are 
> auto-loaded that it's super hard to be sure enough to say "no").  I 
> was hoping someone on-list would actually know, but I get why that 
> might not be the case.
>
I guess I was saying I am pretty sure that option doesn't exist, but 
trying to find out more info on why you are trying to do this in the 
first place.
> To the questions:
> ============================ /proc/mdstat ===========================
>
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md2 : active raid5 sda5[0] sdc5[2] sdb5[1]
>        11711382912 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>        
> md1 : active raid1 sda2[0] sdb2[1] sdc2[2]
>        2097088 blocks [6/3] [UUU___]
>        
> md0 : active raid1 sda1[0] sdb1[1] sdc1[2]
>        2490176 blocks [6/3] [UUU___]
>        
> unused devices: <none>
> ============================ /proc/mdstat ===========================
> I'm using the script here: 
> http://www.it-eckert.com/blog/2015/agent-less-monitoring-with-xymon/ 
> (xymon-rclient.sh).
>
I'm not familiar with this script, but my guess is that you could modify 
it to behave as you want, if you want to. That is the best part of 
xymon, most of it is simple shell scripts, and so very easy to modify.

> Specifically, the platform is Synology and yes, Synology runs two 
> raid1 arrays over all of the slots (even though some are empty).  I 
> could fix this easily by adding hard drives into the empty slots, but 
> I specifically bought this unit so that I could expand it later.  That 
> is, I both understand that this is properly showing broken but 
> unmounted RAIDs and I know why those RAIDs are broken (and thus why 
> the errors are nominal in my setup).
>
I suspect (but never owner a synology nas) that md0/md1 are used for the 
OS, and/or something similar. Possibly only mounted/used during boot up, 
or perhaps for OS updates. You might be able to discuss with synology, 
and there may be a update available to fix this. Given md2 clearly is 
grown as you add a drive, then it should be capable to use the same 
process to grow md0/md1.
> I am still hoping that a failure state that is nominal would be 
> something I'd be able to ignore (just as I can ignore specific 
> libraries or individual filesystems).
>
I guess "most" people don't setup broken RAID on purpose, and even if 
they expect it to be broken for a (short) period of time, then they 
might ack the alert, leaving it as a reminder to fix it soon (eg waiting 
for a replacement drive/etc).
> The other choice for me is to entirely remove the mdstat portion of 
> Ekert's script.  (Sadly, there is nothing else for Synology monitoring 
> that I can get to work at all, and that simple script otherwise covers 
> all of what I need). This means, I won't be notified (through Xymon) 
> if one of my drives does fail, but it's better than getting used to 
> ignoring a RED background.
>
So, you have at least three options:

1) Try to fix the synology to avoid having broken RAID devices (ie, 
reduce the number of RAID members from 6 to 3 for md0/md1)

2) Try to fix the monitoring script, possibly add some sort of "config" 
file, so if an array is detected as broken, then check the config, the 
config can either specify to ignore the array completely, or specify the 
number of members drives to mark it as "green". This way, md0 reduced 
from 3 drives to 2, you would still get an alert. Another option would 
be to use the config to only monitor specified arrays, in this case md2.

3) Remove the monitoring with xymon.

Regards,
Adam

> [Archive readers: It is okay to contact me directly with questions 
> about my setup]
>
> Thank you,
> Gary Allen Vollink
>
>
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20200428/fe18fcb4/attachment.htm>