[Xymon] xymond_rrd - Program crashed after fresh install of Xymon 4.3.30 and data from Xymon 4.3.17

Robert Herron robert.herron at gmail.com
Fri Oct 18 22:21:57 CEST 2019


How is the data from the Synology device being processed into RRD?  Are you
using NCV or the "--extra-script" method?

I ran into a similar RRD crash when upgrading my TST Xymon environment from
4.3.28 to both 4.3.29 and .30. I tracked it down to a bug with my
extra-script for a custom test. I reworked the custom test to use NCV and
the crash stopped.

On Thu, Oct 17, 2019, 7:38 AM Andrey Chervonets <A.Chervonets at cominder.eu>
wrote:

> To get more information I have enabled "--debug"  to both channels (status
> and data).
> Then we see a bit more information in rrd-status.log:
> ....
> 2019-10-17 13:40:02.376153 Host 'synologyhost.domain.eu' reports netstat
> for an unknown OS
> 408 2019-10-17 13:40:02.376181 Flush, but xymonmsg is empty
> 408 2019-10-17 13:40:02.376185 0 status messages merged into 1
> transmissions
> 408 2019-10-17 13:40:02.376203 xymond_rrd: Got message 612 @@status#612/
> synologyhost.domain.eu|1571308802.357389|83.99.221.6||
> synologyhost.domain.eu
> |procs|1571326802|green||green|1570620002|0||0||1571051696||p_cominder|0|
> 408 2019-10-17 13:40:02.376210 startpos 95710, fillpos 99309, endpos 97006
> 408 2019-10-17 13:40:02.376227 Flush, but xymonmsg is empty
> 408 2019-10-17 13:40:02.376233 0 status messages merged into 1
> transmissions
> 408 2019-10-17 13:40:02.376244 xymond_rrd: Got message 613 @@status#613/
> synologyhost.domain.eu|1571308802.357673|83.99.221.6||
> synologyhost.domain.eu
> |raid|1571326802|green||green|1570620002|0||0||1571051696||p_cominder|0|
> 408 2019-10-17 13:40:02.376251 startpos 97010, fillpos 99309, endpos 97945
> 408 2019-10-17 13:40:02.376269 Flush, but xymonmsg is empty
> 408 2019-10-17 13:40:02.376276 0 status messages merged into 1
> transmissions
> 408 2019-10-17 13:40:02.376288 xymond_rrd: Got message 614 @@status#614/
> synologyhost.domain.eu|1571308802.368308|83.99.221.6||
> synologyhost.domain.eu
> |temperature|1571326802|green||green|1570620002|0||0||1571051696||p_cominder|0|
> 408 2019-10-17 13:40:02.376294 startpos 97949, fillpos 99309, endpos 98645
> 2019-10-17 13:40:02.381339 Child process 408 died: Signal 6
> 2019-10-17 13:40:04.432302 Peer at 0.0.0.0:0 failed: Broken pipe
> 2019-10-17 13:40:04.452708 Peer not up, flushing message queue
> 13920 2019-10-17 13:40:04.557656  setup_feedback_queue: got ID -1 for key
> 0xA03EB91
> 13920 2019-10-17 13:40:04.558141 Opening file
> /u01/app/xymon/product/xymon4.3.30/server/etc/rrddefinitions.cfg
> 13920 2019-10-17 13:40:04.558326 Want msg 1, startpos 0, fillpos 0, endpos
> -1, usedbytes=0, bufleft=1052671
> 13920 2019-10-17 13:40:04.558359 Got 6716 bytes
> ...
> Here we can see processing of data from our Synology NAS with Synology
> Monitoring Tool 1.4.8, http://www.sysco.ch/synomon/ enabled.
> Make note - despite RRD crash we can see good status and text of
> "temperature" metric status like:
> --
> Device             Temp(C)   Temp(F)
> ---------------------------------------
> green    system         52      125
> green    /dev/sda       36      96
> green    /dev/sdb       38      100
> green    /dev/sdd       36      96
> ---------------------------------------
>
> Synology Monitoring Tool 1.4.8, http://www.sysco.ch/synomon/
> Model: RS812+ (synologyhost,domain.eu)
> Processor: Intel(R) Atom(TM) CPU D2701   @ 2.13GHz
> System temperature: 52°C
> Serial number: serialnumberdata-replaced
> Firmware: 6.2-24922
> MAC address(s): number-replaced, number-replaced
> Linux version 3.10.105 (root at build10) (gcc version 4.9.3 20150311
> (prerelease) (crosstool-NG 1.20.0) ) #24922 SMP Fri May 10 02:51:01 CST 2019
> --
>
> After stopping the plugin on Synology we have got no more data from it and
> no more xymond_rrd crash (red changed to purple, as expected).
>
> I am note sure where is the problem/bug. So I have added the Synology
> Monitoring Tool developers e-mail to our communictaion.
>
> Please, review and give the hint how can we fix the problem -  our NAS
> state monitoring is quite critical thing we need.
>
> The suspection has been also proved by GDC info (as instructed at:
> http://www.robertandrobert.com/xymon/help/known-issues.html ):
> --
> [xymon at synologyhost server]$ /bin/gdb
> /u01/app/xymon/product/xymon4.3.30/server/bin/xymond_rrd  tmp/core.408
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> ... copyright...
> ...
> Reading symbols from
> /u01/app/xymon/product/xymon4.3.30/server/bin/xymond_rrd...done.
> [New LWP 408]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `xymond_rrd
> --rrddir=/u01/app/xymon/product/xymon4.3.30/data/rrd --debug'.
> Program terminated with signal 6, Aborted.
> #0  0x00007f62fcd85337 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> bzip2-libs-1.0.6-13.el7.x86_64 cairo-1.15.12-4.el7.x86_64
> expat-2.1.0-10.el7_3.x86_64 fontconfig-2.13.0-4.3.el7.x86_64
> freetype-2.8-14.el7.x86_64 fribidi-1.0.2-1.el7.x86_64
> glib2-2.56.1-5.el7.x86_64 glibc-2.17-292.el7.x86_64
> graphite2-1.3.10-1.el7_3.x86_64 harfbuzz-1.7.5-2.el7.x86_64
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64
> libX11-1.6.7-2.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64
> libXext-1.3.3-3.el7.x86_64 libXrender-0.9.10-1.el7.x86_64
> libcom_err-1.42.9-16.el7.x86_64 libffi-3.0.13-18.el7.x86_64
> libgcc-4.8.5-39.el7.x86_64 libglvnd-1.0.1-0.8.git5baa1e5.el7.x86_64
> libglvnd-egl-1.0.1-0.8.git5baa1e5.el7.x86_64
> libglvnd-glx-1.0.1-0.8.git5baa1e5.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64
> libselinux-2.5-14.1.el7.x86_64 libthai-0.1.14-9.el7.x86_64
> libtirpc-0.2.4-0.16.el7.x86_64 libuuid-2.23.2-61.el7.x86_64
> libxcb-1.13-1.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64
> openssl-libs-1.0.2k-19.el7.x86_64 pango-1.42.4-4.el7_7.x86_64
> pcre-8.32-17.el7.x86_64 pixman-0.34.0-1.el7.x86_64
> rrdtool-1.4.8-9.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
> zlib-1.2.7-18.el7.x86_64
> (gdb)
> (gdb)
> (gdb) bt
> #0  0x00007f62fcd85337 in raise () at /lib64/libc.so.6
> #1  0x00007f62fcd86a28 in abort () at /lib64/libc.so.6
> #2  0x0000000000428e63 in sigsegv_handler (signum=<optimized out>) at
> sig.c:57
> #3  0x00007f62fcd853b0 in <signal handler called> () at /lib64/libc.so.6
> #4  0x00007f62fcd89f97 in ____strtoll_l_internal () at /lib64/libc.so.6
> #5  0x000000000040f9c2 in do_temperature_rrd (__nptr=0x0) at
> /usr/include/stdlib.h:280
> #6  0x000000000040f9c2 in do_temperature_rrd (hostname=hostname at entry=0x7f62fdfceb43
> "synologyhost.domain.eu", testname=testname at entry=0x7f62fdfceb58
> "temperature", classname=classname at entry=0x7f62fdfceb99 "p_cominder",
> pagepaths=pagepaths at entry=0x7f62fdfceba4 "0", msg=msg at entry=0x7f62fdfceba7
> "status+300 synologyhost,domain.eu.temperature green 2019-10-17 13:40:01 [
> synologyhost.domain.eu] - temperature\nDevice", ' ' <repeats 13 times>,
> "Temp(C)   Temp(F)\n", '-' <repeats 39 times>, "\n&green    system"...,
> tstamp=tstamp at entry=1571308802) at rrd/do_temperature.c:100
> #7  0x000000000041316b in update_rrd (hostname=hostname at entry=0x7f62fdfceb43
> "synologyhost.domain.eu", testname=<optimized out>,
>     testname at entry=0x7f62fdfceb58 "temperature", msg=msg at entry=0x7f62fdfceba7
> "status+300 synologyhost,domain.eu.temperature green 2019-10-17 13:40:01 [
> synologyhost.domain.eu] - temperature\nDevice", ' ' <repeats 13 times>,
> "Temp(C)   Temp(F)\n", '-' <repeats 39 times>, "\n&green    system"...,
> tstamp=tstamp at entry=1571308802, sender=sender at entry=0x7f62fdfceb36
> "83.99.221.6", ldef=<optimized out>, classname=classname at entry=0x7f62fdfceb99
> "p_cominder", pagepaths=pagepaths at entry=0x7f62fdfceba4 "0") at
> do_rrd.c:714
> #8  0x0000000000403434 in main (argc=<optimized out>, argv=0x7ffffb4bd4b8)
> at xymond_rrd.c:391
> (gdb)
> --
>
> So, we know which metric cause RRD crash, we have workaround (to make RRD
> working to generate other metrics graphs),
> but we need better solution to make all that working as expected.
>
> P.S. Note: real hostname is replaced in all outputs submitted in e-mail
> (just if there are some checksums are used).
>
>
> Best regards,
>
> Andrey Chervonets
> ----------------------
> CoMinder Support
> http://www.cominder.eu/
> mobile: +371 26517848
>
>
>
>
> "Xymon" <xymon-bounces at xymon.com> wrote on 15.10.2019 13:00:01:
>
> > From: xymon-request at xymon.com
> > To: xymon at xymon.com
> > Date: 15.10.2019 13:00
> > Subject: Xymon Digest, Vol 105, Issue 9
> > Sent by: "Xymon" <xymon-bounces at xymon.com>
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 14 Oct 2019 15:09:53 +0300
> > From: Andrey Chervonets <A.Chervonets at cominder.eu>
> > To: xymon at xymon.com
> > Subject: [Xymon] xymond_rrd - Program crashed after fresh install of
> >    Xymon 4.3.30 and data from Xymon 4.3.17
> > Message-ID:
> >    <
> OFD5D1CD2D.3E1D4B14-ONC2258493.00408D6C-C2258493.0042D300 at cominder.eu>
> >
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Good day!
> >
> > Recently we have installed Xymon 4.3.30 on new VM (CentOS Linux release
> > 7.7.1908 (Core)  - guest under KVM
> > Guest Kernel:   3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44
> UTC
> > 2019 x86_64 x86_64 x86_64 GNU/Linux
> >
> > All OK, except xymond_rrd is crashing frequently - the "xymond_rrd"
> metric
> > is always red (was never green) with message:
> >  - Program crashed
> > Fatal signal caught!
> >
> > In rrd-status.log we can find frequent messages like:
> >
> > 2019-10-14 14:35:03.609265 Child process 2997 died: Signal 6
> > 2019-10-14 14:35:04.239677 Peer at 0.0.0.0:0 failed: Broken pipe
> > 2019-10-14 14:35:08.886124 Peer not up, flushing message queue
> > 2019-10-14 14:36:45.883398 Host 'synologyhost.domain.eu' reports
> netstat
> > for an unknown OS
> > 2019-10-14 14:36:45.888875 Child process 21622 died: Signal 6
> > 2019-10-14 14:36:52.510319 Peer at 0.0.0.0:0 failed: Broken pipe
> > 2019-10-14 14:36:52.510720 Peer not up, flushing message queue
> > 2019-10-14 14:40:02.689062 Host 'synologyhost.domain.eu' reports
> netstat
> > for an unknown OS
> > 2019-10-14 14:40:02.694320 Child process 28158 died: Signal 6
> > 2019-10-14 14:40:05.119354 Peer at 0.0.0.0:0 failed: Broken pipe
> > 2019-10-14 14:40:05.250422 Peer not up, flushing message queue
> >
> > Note: lines like "Host 'synologyhost.domain.eu' reports netstat for an
> > unknown OS" are comining from Synonlogy NAS with Monitoring package
> > installed.
> > I am sure it is not related - it was working on old Xymon 4.3.17 (CentOS
> > 6.6)
> >
> > After fresh installation we just remapped (with symbolic link) the data
> > directory to continue employ old data logs and rra.
> >
> > There is plenty of core files under server/tmp/
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 14:40 rrdctl.572
> > -rw------- 1 xymon monitor 3252224 Oct 14 14:45 core.572
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 14:45 rrdctl.17027
> > -rw------- 1 xymon monitor 3248128 Oct 14 14:50 core.17027
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 14:50 rrdctl.30574
> > -rw------- 1 xymon monitor 3248128 Oct 14 14:55 core.30574
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 14:55 rrdctl.13275
> > -rw------- 1 xymon monitor 3239936 Oct 14 15:00 core.13275
> > -rw-r--r-- 1 xymon monitor 1887355 Oct 14 15:02 xymond.chk
> > -rw-r--r-- 1 xymon monitor       0 Oct 14 15:02 alert.chk.sub
> > -rw-r--r-- 1 xymon monitor   70921 Oct 14 15:02 alert.chk
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 15:02 rrdctl.5887
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 15:02 rrdctl.5954
> > -rw------- 1 xymon monitor 3764224 Oct 14 15:05 core.5887
> > srw-rw-rw- 1 xymon monitor       0 Oct 14 15:05 rrdctl.10234
> >
> >
> > Question: How can we diagnose what is the cause of the problem?
> >
> >
> >
> > Best regards,
> >
> > Andrey Chervonets
> > ----------------------
> > SIA CoMinder
> > http://www.cominder.eu/
> > mobile: +371 26517848
> > -------------- next part --------------
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20191018/099aef90/attachment.htm>


More information about the Xymon mailing list