[Xymon] nvme temperature check broken in Debian bookworm

damien at makelofine.org damien at makelofine.org
Tue Apr 23 09:24:42 CEST 2024


Hi,

My 2 cents:
'temp' script is part of hobbit-plugins in Debian/Ubuntu/Mint

I don't remember when it was introduced, because I've created my own set 
of hardware monitoring scripts (with support limited to my computers).
Feel free to test it if relevant:
https://github.com/doktoil-makresh/xymon-plugins/tree/master/xymon-hardware

Le 23/04/2024 01:35, Jeremy Laidman a écrit :
> Christoph
> 
> I'm fairly sure that a "temp" script is not part of the standard Xymon
> client, and it doesn't appear to be part of the Debian/Bookworm
> package either. Generally, scripts in "ext" are add-ons to a package
> by the local installer/administrator. In summary, I don't know where
> that script came from, and it's possible nobody else on this list
> knows.
> 
> Cheers
> Jeremy
> 
> On Tue, 9 Apr 2024 at 20:43, Christoph Zechner <zechner at vrvis.at>
> wrote:
> 
>> Hi,
>> 
>> the temperature check in xymon's version of bookworm is broken in a
>> rather strange way. The check is located in
>> /usr/lib/xymon/client/ext/temp and fails for all NVMe disks that
>> contain
>> several temperature sensors:
>> 
>> For example the first NVMe in Lynx which holds its temperature
>> values in
>> /sys/block/nvme0n1/device/hwmon0/:
>> files   name        value   min         max         crit
>> temp1_* Composite   27.85   -273.15     86.85       87.85
>> temp2_* Sensor 1    27.85   -273.15     65261.85    n/a
>> temp3_* Sensor 2    31.85   -273.15     65261.85    n/a
>> 
>> The inner logic of the temperature check works as follows to
>> calculate
>> the values for red and yellow:
>> 
>> 1) if there is a crit and a max value use them
>> 2) if there is a max and a mid value use them
>> 3) if there is a max and a min value, use them
>> 4) if there is only a max file, use it for both
>> 5) if there is only a crit file, use it for both.
>> 
>> The sensor 'Composite' uses max and crit as they're both available.
>> 'Sensor 1' and 'Sensor 2' however do only provide max and min.
>> Therefor
>> these values are being used but lead to 'yellow' warnings as the min
>> 
>> value actually isn't an upper boundary as assumed but a lower
>> boundary.
>> 
>> The linux kernel documentation
>> (https://docs.kernel.org/hwmon/sysfs-interface.html) also outlines
>> that
>> every file using 'min' in their name is a low threshold:
>> 
>> The common scheme for files naming is: <type><number>_<item>.
>> Usual
>> types for sensor chips are "in" (voltage), "temp" (temperature) and
>> "fan" (fan). Usual items are "input" (measured value), "max" (high
>> threshold, "min" (low threshold).
>> 
>> The proposed fix would be to either use max value for yellow and red
>> or
>> to at least sanity check whether min is below zero and in that
>> specific
>> case only use the max value for both:
>> 
>> In /usr/lib/xymon/client/ext/temp beginning on line 182:
>> 
>> my ($red, $yellow);
>> if (-r $crit_file and -r $max_file) {
>> $red     = read_one_chomped_line_from_file($crit_file);
>> $yellow  = read_one_chomped_line_from_file($max_file);
>> } elsif (-r $max_file and -r $mid_file) {
>> $red     = read_one_chomped_line_from_file($max_file);
>> $yellow  = read_one_chomped_line_from_file($mid_file);
>> } elsif (-r $max_file and -r $min_file) {
>> $red     = read_one_chomped_line_from_file($max_file);
>> #TODO: min_file contains the lower temperature boundary
>> and
>> #      *not* the warning value; only solution to this
>> would
>> #      be to set either yellow to red or to at least do
>> that
>> #      when yellow is below 0 for example.
>> $yellow  = read_one_chomped_line_from_file($min_file);
>> $yellow = $yellow > 0 ? $yellow : $red;
>> # Alternative solution: do not use min at all v1:
>> #$red = $yellow =
>> read_one_chomped_line_from_file($max_file);
>> # Alternative solution: do not use min at all v2:
>> remove
>> this 'elsif'
>> } elsif (-r $max_file) {
>> $red = $yellow =
>> read_one_chomped_line_from_file($max_file);
>> } elsif (-r $crit_file) {
>> $red = $yellow =
>> read_one_chomped_line_from_file($crit_file);
>> }
>> 
>> There are three ways to solve this:
>> 
>> * sanitize min by checking whether the value is below 0 and in that
>> case
>> use the max value
>> * use the max value in any way
>> * completely remove the 'elsif' that reads min and max as the next
>> 'elsif' just reads and uses max
>> 
>> Thanks in advance!
>> 
>> Best regards
>> Christoph
>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon


More information about the Xymon mailing list