[Xymon] xymond_rrd - Program crashed after fresh install of Xymon 4.3.30 and data from Xymon 4.3.17

Andrey Chervonets A.Chervonets at cominder.eu
Mon Oct 14 14:09:53 CEST 2019


Good day!

Recently we have installed Xymon 4.3.30 on new VM (CentOS Linux release 
7.7.1908 (Core)  - guest under KVM
Guest Kernel:   3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

All OK, except xymond_rrd is crashing frequently - the "xymond_rrd" metric 
is always red (was never green) with message:
 - Program crashed
Fatal signal caught!

In rrd-status.log we can find frequent messages like:

2019-10-14 14:35:03.609265 Child process 2997 died: Signal 6
2019-10-14 14:35:04.239677 Peer at 0.0.0.0:0 failed: Broken pipe
2019-10-14 14:35:08.886124 Peer not up, flushing message queue
2019-10-14 14:36:45.883398 Host 'synologyhost.domain.eu' reports netstat 
for an unknown OS
2019-10-14 14:36:45.888875 Child process 21622 died: Signal 6
2019-10-14 14:36:52.510319 Peer at 0.0.0.0:0 failed: Broken pipe
2019-10-14 14:36:52.510720 Peer not up, flushing message queue
2019-10-14 14:40:02.689062 Host 'synologyhost.domain.eu' reports netstat 
for an unknown OS
2019-10-14 14:40:02.694320 Child process 28158 died: Signal 6
2019-10-14 14:40:05.119354 Peer at 0.0.0.0:0 failed: Broken pipe
2019-10-14 14:40:05.250422 Peer not up, flushing message queue

Note: lines like "Host 'synologyhost.domain.eu' reports netstat for an 
unknown OS" are comining from Synonlogy NAS with Monitoring package 
installed.
I am sure it is not related - it was working on old Xymon 4.3.17 (CentOS 
6.6)

After fresh installation we just remapped (with symbolic link) the data 
directory to continue employ old data logs and rra.

There is plenty of core files under server/tmp/
srw-rw-rw- 1 xymon monitor       0 Oct 14 14:40 rrdctl.572
-rw------- 1 xymon monitor 3252224 Oct 14 14:45 core.572
srw-rw-rw- 1 xymon monitor       0 Oct 14 14:45 rrdctl.17027
-rw------- 1 xymon monitor 3248128 Oct 14 14:50 core.17027
srw-rw-rw- 1 xymon monitor       0 Oct 14 14:50 rrdctl.30574
-rw------- 1 xymon monitor 3248128 Oct 14 14:55 core.30574
srw-rw-rw- 1 xymon monitor       0 Oct 14 14:55 rrdctl.13275
-rw------- 1 xymon monitor 3239936 Oct 14 15:00 core.13275
-rw-r--r-- 1 xymon monitor 1887355 Oct 14 15:02 xymond.chk
-rw-r--r-- 1 xymon monitor       0 Oct 14 15:02 alert.chk.sub
-rw-r--r-- 1 xymon monitor   70921 Oct 14 15:02 alert.chk
srw-rw-rw- 1 xymon monitor       0 Oct 14 15:02 rrdctl.5887
srw-rw-rw- 1 xymon monitor       0 Oct 14 15:02 rrdctl.5954
-rw------- 1 xymon monitor 3764224 Oct 14 15:05 core.5887
srw-rw-rw- 1 xymon monitor       0 Oct 14 15:05 rrdctl.10234


Question: How can we diagnose what is the cause of the problem?



Best regards,

Andrey Chervonets
----------------------
SIA CoMinder
http://www.cominder.eu/
mobile: +371 26517848
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20191014/cd2bcda4/attachment.htm>


More information about the Xymon mailing list