[Xymon] nvme temperature check broken in Debian bookworm
damien at makelofine.org
damien at makelofine.org
Tue Apr 23 09:24:42 CEST 2024
Hi,
My 2 cents:
'temp' script is part of hobbit-plugins in Debian/Ubuntu/Mint
I don't remember when it was introduced, because I've created my own set
of hardware monitoring scripts (with support limited to my computers).
Feel free to test it if relevant:
https://github.com/doktoil-makresh/xymon-plugins/tree/master/xymon-hardware
Le 23/04/2024 01:35, Jeremy Laidman a écrit :
> Christoph
>
> I'm fairly sure that a "temp" script is not part of the standard Xymon
> client, and it doesn't appear to be part of the Debian/Bookworm
> package either. Generally, scripts in "ext" are add-ons to a package
> by the local installer/administrator. In summary, I don't know where
> that script came from, and it's possible nobody else on this list
> knows.
>
> Cheers
> Jeremy
>
> On Tue, 9 Apr 2024 at 20:43, Christoph Zechner <zechner at vrvis.at>
> wrote:
>
>> Hi,
>>
>> the temperature check in xymon's version of bookworm is broken in a
>> rather strange way. The check is located in
>> /usr/lib/xymon/client/ext/temp and fails for all NVMe disks that
>> contain
>> several temperature sensors:
>>
>> For example the first NVMe in Lynx which holds its temperature
>> values in
>> /sys/block/nvme0n1/device/hwmon0/:
>> files name value min max crit
>> temp1_* Composite 27.85 -273.15 86.85 87.85
>> temp2_* Sensor 1 27.85 -273.15 65261.85 n/a
>> temp3_* Sensor 2 31.85 -273.15 65261.85 n/a
>>
>> The inner logic of the temperature check works as follows to
>> calculate
>> the values for red and yellow:
>>
>> 1) if there is a crit and a max value use them
>> 2) if there is a max and a mid value use them
>> 3) if there is a max and a min value, use them
>> 4) if there is only a max file, use it for both
>> 5) if there is only a crit file, use it for both.
>>
>> The sensor 'Composite' uses max and crit as they're both available.
>> 'Sensor 1' and 'Sensor 2' however do only provide max and min.
>> Therefor
>> these values are being used but lead to 'yellow' warnings as the min
>>
>> value actually isn't an upper boundary as assumed but a lower
>> boundary.
>>
>> The linux kernel documentation
>> (https://docs.kernel.org/hwmon/sysfs-interface.html) also outlines
>> that
>> every file using 'min' in their name is a low threshold:
>>
>> The common scheme for files naming is: <type><number>_<item>.
>> Usual
>> types for sensor chips are "in" (voltage), "temp" (temperature) and
>> "fan" (fan). Usual items are "input" (measured value), "max" (high
>> threshold, "min" (low threshold).
>>
>> The proposed fix would be to either use max value for yellow and red
>> or
>> to at least sanity check whether min is below zero and in that
>> specific
>> case only use the max value for both:
>>
>> In /usr/lib/xymon/client/ext/temp beginning on line 182:
>>
>> my ($red, $yellow);
>> if (-r $crit_file and -r $max_file) {
>> $red = read_one_chomped_line_from_file($crit_file);
>> $yellow = read_one_chomped_line_from_file($max_file);
>> } elsif (-r $max_file and -r $mid_file) {
>> $red = read_one_chomped_line_from_file($max_file);
>> $yellow = read_one_chomped_line_from_file($mid_file);
>> } elsif (-r $max_file and -r $min_file) {
>> $red = read_one_chomped_line_from_file($max_file);
>> #TODO: min_file contains the lower temperature boundary
>> and
>> # *not* the warning value; only solution to this
>> would
>> # be to set either yellow to red or to at least do
>> that
>> # when yellow is below 0 for example.
>> $yellow = read_one_chomped_line_from_file($min_file);
>> $yellow = $yellow > 0 ? $yellow : $red;
>> # Alternative solution: do not use min at all v1:
>> #$red = $yellow =
>> read_one_chomped_line_from_file($max_file);
>> # Alternative solution: do not use min at all v2:
>> remove
>> this 'elsif'
>> } elsif (-r $max_file) {
>> $red = $yellow =
>> read_one_chomped_line_from_file($max_file);
>> } elsif (-r $crit_file) {
>> $red = $yellow =
>> read_one_chomped_line_from_file($crit_file);
>> }
>>
>> There are three ways to solve this:
>>
>> * sanitize min by checking whether the value is below 0 and in that
>> case
>> use the max value
>> * use the max value in any way
>> * completely remove the 'elsif' that reads min and max as the next
>> 'elsif' just reads and uses max
>>
>> Thanks in advance!
>>
>> Best regards
>> Christoph
>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
More information about the Xymon
mailing list