[Xymon] Xymon 4.3.30-1 memory issues and core dumps
Carl Melgaard
Carl.Melgaard at STAB.RM.DK
Tue Dec 15 11:11:08 CET 2020
Hi,
Thanks for the thorough walkthrough!
>Combo messages can be large, and this could a) cause increased RAM usage, or b) be affected by it. Again, it's not clear if the behaviour of xymongen and xymonnet are the cause of your problems or the result of them.
>It looks like the call to combo_start() in xymongen is in code that runs as a result of the "--report" switch. In xymonnet, it's in common code, but there seems to be a modified code path available if you were to add "--bfq" (backfeed queue). I know nothing about the backfeed queue >feature, but there's a little about it in the README.backfeed file.
>So some things to consider, mostly just work-arounds and troubleshooting:
>1. Make sure you have swap enabled, and monitor swap-in/swap-out.
Swap is enabled, current usage:
Memory Used Total Percentage
[green] Real/Physical 15706M 15885M 98%
[green] Actual/Virtual 13054M 15885M 82%
[green] Swap/Page 0M 8063M 0%
The old server running CentOS 5 and Xymon 4.37 is running on 4 GB of memory and using 96% - with the same tests…
>2. See if anything else is using excessive RAM.
It’s mostly just xymon processes eating up RAM:
xymon 1116 0.0 0.0 37940 1540 ? Ss Dec14 0:01 /usr/sbin/xymonlaunch --no-daemon --log=/var/log/xymon/xymonlaunch.log
xymon 1138 0.5 1.0 12040844 171980 ? S Dec14 8:13 xymond --restart=/var/lib/xymon/tmp/xymond.chk --checkpoint-file=/var/lib/xymon/tmp/xymond.chk --checkpoint-interval=600 --admin-senders=127.0.0.1,<x.x.x.x> --store-clientlogs=!msgs
xymon 1695 0.0 0.0 6212724 8240 ? S Dec14 0:24 xymond_channel --channel=stachg xymond_history
xymon 1696 0.0 0.0 6211996 2764 ? S Dec14 0:26 xymond_channel --channel=page xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600
xymon 1697 0.0 0.0 6213976 8628 ? S Dec14 0:49 xymond_channel --channel=client xymond_client
xymon 1698 0.0 0.0 6213804 8704 ? S Dec14 1:17 xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon 1699 0.0 0.0 6211864 2084 ? S Dec14 0:12 xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon 1700 0.0 0.0 6212580 8392 ? S Dec14 0:00 xymond_channel --channel=clichg xymond_hostdata
xymon 1743 0.1 31.9 6315044 5200828 ? S Dec14 1:46 xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon 1744 0.1 31.9 6218584 5194192 ? S Dec14 1:53 xymond_client
xymon 1745 0.0 0.4 5235992 74140 ? S Dec14 0:00 xymond_history
xymon 1746 0.0 0.3 5229068 48912 ? S Dec14 0:05 xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600
xymon 1747 0.0 6.8 6306576 1121700 ? S Dec14 0:32 xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon 2213 0.0 0.0 5225548 15060 ? S Dec14 0:00 xymond_hostdata
xymon 13022 0.0 0.0 116340 3024 pts/0 S 10:53 0:00 -bash
xymon 14689 0.0 0.0 113420 1568 ? S 10:59 0:00 /bin/sh /usr/share/xymon/ext/ntpd.sh
xymon 14822 0.0 0.0 9568 1140 ? S 10:59 0:00 /bin/sh
xymon 14823 0.0 0.0 9568 1136 ? S 10:59 0:00 /bin/sh
xymon 14828 0.0 0.0 49016 1264 ? S 10:59 0:00 vmstat 300 2
xymon 14829 0.0 0.0 49016 1264 ? S 10:59 0:00 vmstat 300 2
xymon 15680 0.0 0.0 113420 720 ? S 11:02 0:00 /bin/sh /usr/share/xymon/ext/ntpd.sh
xymon 15681 0.0 0.0 23652 1504 ? S 11:02 0:00 /usr/sbin/ntpdate -t 1 -p 5 -u -q <server>
>3. Play with combo message sizes. Perhaps a smaller size would help. You can set MAXMSGSPERCOMBO in xymonserver.cfg.
>6. Profile the xymond process's memory usage. I'm not sure how to do this. Perhaps you can get it to dump core, then analyse the core (perhaps just run "strings" over it) to see what's using up all the memory. Perhaps there's some gdb techniques for this.
>7. Try running xymongen without "--report", and xymonnet with "--bfq" or "--no-bfq".
I’ll play around with combomsg sizes and try omitting the reports – doing a bt full, I get this output:
Reading symbols from /usr/libexec/xymon/xymonnet...Reading symbols from /usr/lib/debug/usr/libexec/xymon/xymonnet.debug...done.
done.
[New LWP 15566]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `xymonnet --report --ping --checkresponse --dns-timeout=3 --dnslog=/var/log/xymo'.
Program terminated with signal 6, Aborted.
#0 0x00007f7484027387 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt full
#0 0x00007f7484027387 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
resultvar = 0
pid = 15566
selftid = 15566
#1 0x00007f7484028a78 in __GI_abort () at abort.c:90
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 47120176, sa_restorer = 0x0}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x0000000000422d95 in sigsegv_handler (signum=<optimized out>) at sig.c:57
No locals.
#3 <signal handler called>
No locals.
#4 strbuf_addtobuffer (buf=0x0, newtext=0x2ceff30 "extcombo", ' ' <repeats 192 times>..., newlen=2000) at strfunc.c:115
No locals.
#5 0x0000000000424635 in addtobufferraw (buf=<optimized out>, newdata=<optimized out>, bytes=<optimized out>) at strfunc.c:184
No locals.
#6 0x000000000042d9b2 in combo_start () at sendmsg.c:908
No locals.
#7 0x00000000004064dc in main (argc=6, argv=0x7ffd5619f5b8) at xymonnet.c:2554
msg = "PING test completed (1913 hosts)", '\000' <repeats 479 times>
handle = <optimized out>
s = <optimized out>
h = <optimized out>
t = <optimized out>
argi = <optimized out>
concurrency = <optimized out>
pingcolumn = <optimized out>
egocolumn = <optimized out>
failgoesclear = <optimized out>
dumpdata = <optimized out>
runtimewarn = 300
servicedumponly = <optimized out>
pingrunning = 1
usebackfeedqueue = 0
force_backfeedqueue = <optimized out>
network_count = <optimized out>
- Which looks like the report part?
My xymond.chk file is 55 MB – is that an issue?
Regards,
Carl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20201215/68998fd7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 118 bytes
Desc: image001.gif
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20201215/68998fd7/attachment.gif>
More information about the Xymon
mailing list