[Xymon] SPLITNCV with wildcard because of multiple graphs from one test

Damien Martins damien at makelofine.org
Thu Oct 1 15:58:11 CEST 2020


Le 29/09/2020 à 16:24, Oliver R. a écrit :
> Dear All,
>
> I have a custom test that reports data from nvme disks from the command:
>
> nvme smart-log /dev/nvme2n1 -o json
>
> The output is processed so that the client reports the following to the
> server:
>
> S3ESNX0J951626N-critical_warning : 0
> S3ESNX0J951626N-temperature : 318
> S3ESNX0J951626N-avail_spare : 100
> S3ESNX0J951626N-spare_thresh : 10
> S3ESNX0J951626N-percent_used : 92
> S3ESNX0J951626N-data_units_read : 27832088
> S3ESNX0J951626N-data_units_written : 93877408
> S3ESNX0J951626N-host_read_commands : 180442558
> S3ESNX0J951626N-host_write_commands : 916278700
> S3ESNX0J951626N-controller_busy_time : 4028
> S3ESNX0J951626N-power_cycles : 218
> S3ESNX0J951626N-power_on_hours : 2995
> S3ESNX0J951626N-unsafe_shutdowns : 86
> S3ESNX0J951626N-media_errors : 0
> S3ESNX0J951626N-num_err_log_entries : 0
> S3ESNX0J951626N-warning_temp_time : 0
> S3ESNX0J951626N-critical_comp_time : 0
> S3ESNX0J951626N-temperature_sensor_1 : 318
> S3ESNX0J951626N-temperature_sensor_2 : 323
> S3ESNX0J951626N-thm_temp1_trans_count : 0
> S3ESNX0J951626N-thm_temp2_trans_count : 0
> S3ESNX0J951626N-thm_temp1_total_time : 0
> S3ESNX0J951626N-thm_temp2_total_time : 0
> 2J4520102682-critical_warning : 0
> 2J4520102682-temperature : 314
> 2J4520102682-avail_spare : 100
> 2J4520102682-spare_thresh : 32
> 2J4520102682-percent_used : 2
> 2J4520102682-data_units_read : 9450966
> 2J4520102682-data_units_written : 24105094
> 2J4520102682-host_read_commands : 137338588
> 2J4520102682-host_write_commands : 284702582
> 2J4520102682-controller_busy_time : 0
> 2J4520102682-power_cycles : 46
> 2J4520102682-power_on_hours : 1107
> 2J4520102682-unsafe_shutdowns : 10
> 2J4520102682-media_errors : 0
> 2J4520102682-num_err_log_entries : 0
> 2J4520102682-warning_temp_time : 0
> 2J4520102682-critical_comp_time : 0
> 2J4520102682-thm_temp1_trans_count : 0
> 2J4520102682-thm_temp2_trans_count : 0
> 2J4520102682-thm_temp1_total_time : 0
> 2J4520102682-thm_temp2_total_time : 0
> S3ESNX0J951635M-critical_warning : 0
> S3ESNX0J951635M-temperature : 310
> S3ESNX0J951635M-avail_spare : 100
> S3ESNX0J951635M-spare_thresh : 10
> S3ESNX0J951635M-percent_used : 92
> S3ESNX0J951635M-data_units_read : 32693378
> S3ESNX0J951635M-data_units_written : 95742837
> S3ESNX0J951635M-host_read_commands : 213266959
> S3ESNX0J951635M-host_write_commands : 918085461
> S3ESNX0J951635M-controller_busy_time : 4280
> S3ESNX0J951635M-power_cycles : 218
> S3ESNX0J951635M-power_on_hours : 3072
> S3ESNX0J951635M-unsafe_shutdowns : 86
> S3ESNX0J951635M-media_errors : 0
> S3ESNX0J951635M-num_err_log_entries : 1
> S3ESNX0J951635M-warning_temp_time : 0
> S3ESNX0J951635M-critical_comp_time : 0
> S3ESNX0J951635M-temperature_sensor_1 : 310
> S3ESNX0J951635M-temperature_sensor_2 : 320
> S3ESNX0J951635M-thm_temp1_trans_count : 0
> S3ESNX0J951635M-thm_temp2_trans_count : 0
> S3ESNX0J951635M-thm_temp1_total_time : 0
> S3ESNX0J951635M-thm_temp2_total_time : 0
>
> As you can see, there are three disks, that have the same metrics with
> different values. Now I started with a xymonserver.d/nvme.cfg looking
> like this:
>
> TEST2RRD="$TEST2RRD,nvme=ncv"
> SPLITNCV_nvme="*:GAUGE"
> GRAPHS_nvme="nvmecriticalwarning,nvmetemperature,nvmeavailspare,nvmesparethresh,nvmepercentused,nvmedataunitsread,nvmedataunitswritten,nvmehostreadcommands,nvmehostwritecommands,nvmecontrollerbusytime,nvmepowercycles,nvmepoweronhours,nvmeunsafeshutdowns,nvmemediaerrors,nvmenumerrlogentries,nvmewarningtemptime,nvmecriticalcomptime,nvmetemperaturesensor1,nvmetemperaturesensor2,nvmethmtemp1transcount,nvmethmtemp2transcount,nvmethmtemp1totaltime,nvmethmtemp2totaltime" 
>
>
> This causes all rrd files beeing created correctly like this:
>
> $ ls -1 /var/lib/xymon/rrd/wsrbreb/nvme*temperature.rrd
> /var/lib/xymon/rrd/wsrbreb/nvme,2J4520102682_temperature.rrd
> /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951626N_temperature.rrd
> /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951635M_temperature.rrd
>
> Unfortunately all datasets are now saved as datatype "GAUGE", but most
> need to have "DERIVE", but I cannot figure out how to do this. Here is
> what I've tried:
>
> SPLITNCV_nvme="temperture:GAUGE"
> SPLITNCV_nvme="*temperture:GAUGE"
> SPLITNCV_nvme=".*temperture:GAUGE"
> SPLITNCV_nvme="%.*temperture:GAUGE"
>
>
> One thing that does work is the following:
>
> SPLITNCV_nvme="S3ESNX0J951626N_temperature:GAUGE"
>
> But as you can see it requieres the name of the SSD, which cancels out
> all dynamics.
>
> How can I work with wildcards in SPLITNCV scenarios?
>
> Thank you for your help!
>
> Regards
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon


Ok Oliver,

I understand your point now: you want to use wildcard related to the 
dataset name. Hence, we have to validate this is possible...
I could not find anything pointing to wildcard usage in define the 
dataset name, in this documentation:
https://xymon.sourceforge.io/xymon/help/howtograph.html

I'm not a C guy, hence I'll consider documentation only: if there is no 
mention, it does not exist.

You could try to handle manually (awful answer I know) the serial number 
of your NVMe's ?
Or better: contribute to the code (answer from someone lazy who won't)



More information about the Xymon mailing list