[Xymon] nvme temperature check broken in Debian bookworm

Jeremy Laidman jeremy at laidman.org
Tue Apr 23 01:35:58 CEST 2024


Christoph

I'm fairly sure that a "temp" script is not part of the standard Xymon
client, and it doesn't appear to be part of the Debian/Bookworm package
either. Generally, scripts in "ext" are add-ons to a package by the local
installer/administrator. In summary, I don't know where that script came
from, and it's possible nobody else on this list knows.

Cheers
Jeremy

On Tue, 9 Apr 2024 at 20:43, Christoph Zechner <zechner at vrvis.at> wrote:

> Hi,
>
> the temperature check in xymon's version of bookworm is broken in a
> rather strange way. The check is located in
> /usr/lib/xymon/client/ext/temp and fails for all NVMe disks that contain
> several temperature sensors:
>
> For example the first NVMe in Lynx which holds its temperature values in
> /sys/block/nvme0n1/device/hwmon0/:
> files   name        value   min         max         crit
> temp1_* Composite   27.85   -273.15     86.85       87.85
> temp2_* Sensor 1    27.85   -273.15     65261.85    n/a
> temp3_* Sensor 2    31.85   -273.15     65261.85    n/a
>
> The inner logic of the temperature check works as follows to calculate
> the values for red and yellow:
>
> 1) if there is a crit and a max value use them
> 2) if there is a max and a mid value use them
> 3) if there is a max and a min value, use them
> 4) if there is only a max file, use it for both
> 5) if there is only a crit file, use it for both.
>
> The sensor 'Composite' uses max and crit as they're both available.
> 'Sensor 1' and 'Sensor 2' however do only provide max and min. Therefor
> these values are being used but lead to 'yellow' warnings as the min
> value actually isn't an upper boundary as assumed but a lower boundary.
>
> The linux kernel documentation
> (https://docs.kernel.org/hwmon/sysfs-interface.html) also outlines that
> every file using 'min' in their name is a low threshold:
>
>      The common scheme for files naming is: <type><number>_<item>. Usual
> types for sensor chips are "in" (voltage), "temp" (temperature) and
> "fan" (fan). Usual items are "input" (measured value), "max" (high
> threshold, "min" (low threshold).
>
> The proposed fix would be to either use max value for yellow and red or
> to at least sanity check whether min is below zero and in that specific
> case only use the max value for both:
>
> In /usr/lib/xymon/client/ext/temp beginning on line 182:
>
>          my ($red, $yellow);
>          if (-r $crit_file and -r $max_file) {
>              $red     = read_one_chomped_line_from_file($crit_file);
>              $yellow  = read_one_chomped_line_from_file($max_file);
>          } elsif (-r $max_file and -r $mid_file) {
>              $red     = read_one_chomped_line_from_file($max_file);
>              $yellow  = read_one_chomped_line_from_file($mid_file);
>          } elsif (-r $max_file and -r $min_file) {
>              $red     = read_one_chomped_line_from_file($max_file);
>              #TODO: min_file contains the lower temperature boundary and
>              #      *not* the warning value; only solution to this would
>              #      be to set either yellow to red or to at least do that
>              #      when yellow is below 0 for example.
>              $yellow  = read_one_chomped_line_from_file($min_file);
>              $yellow = $yellow > 0 ? $yellow : $red;
>              # Alternative solution: do not use min at all v1:
>              #$red = $yellow = read_one_chomped_line_from_file($max_file);
>              # Alternative solution: do not use min at all v2: remove
> this 'elsif'
>          } elsif (-r $max_file) {
>              $red = $yellow = read_one_chomped_line_from_file($max_file);
>          } elsif (-r $crit_file) {
>              $red = $yellow = read_one_chomped_line_from_file($crit_file);
>          }
>
> There are three ways to solve this:
>
> * sanitize min by checking whether the value is below 0 and in that case
> use the max value
> * use the max value in any way
> * completely remove the 'elsif' that reads min and max as the next
> 'elsif' just reads and uses max
>
> Thanks in advance!
>
> Best regards
> Christoph
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20240423/b4756028/attachment.htm>


More information about the Xymon mailing list