[Xymon] Xymon 4.3.30-1 memory issues and core dumps

Carl Melgaard Carl.Melgaard at STAB.RM.DK
Tue Dec 15 11:11:08 CET 2020


Hi,

Thanks for the thorough walkthrough!

>Combo messages can be large, and this could a) cause increased RAM usage, or b) be affected by it. Again, it's not clear if the behaviour of xymongen and xymonnet are the cause of your problems or the result of them.
>It looks like the call to combo_start() in xymongen is in code that runs as a result of the "--report" switch. In xymonnet, it's in common code, but there seems to be a modified code path available if you were to add "--bfq" (backfeed queue). I know nothing about the backfeed queue >feature, but there's a little about it in the README.backfeed file.
>So some things to consider, mostly just work-arounds and troubleshooting:
>1. Make sure you have swap enabled, and monitor swap-in/swap-out.

Swap is enabled, current usage:

Memory                  Used       Total  Percentage
[green] Real/Physical         15706M      15885M         98%
[green] Actual/Virtual        13054M      15885M         82%
[green] Swap/Page                 0M       8063M          0%

The old server running CentOS 5 and Xymon 4.37 is running on 4 GB of memory and using 96% - with the same tests…

>2. See if anything else is using excessive RAM.

It’s mostly just xymon processes eating up RAM:

xymon     1116  0.0  0.0  37940  1540 ?        Ss   Dec14   0:01 /usr/sbin/xymonlaunch --no-daemon --log=/var/log/xymon/xymonlaunch.log
xymon     1138  0.5  1.0 12040844 171980 ?     S    Dec14   8:13 xymond --restart=/var/lib/xymon/tmp/xymond.chk --checkpoint-file=/var/lib/xymon/tmp/xymond.chk --checkpoint-interval=600 --admin-senders=127.0.0.1,<x.x.x.x> --store-clientlogs=!msgs
xymon     1695  0.0  0.0 6212724 8240 ?        S    Dec14   0:24 xymond_channel --channel=stachg xymond_history
xymon     1696  0.0  0.0 6211996 2764 ?        S    Dec14   0:26 xymond_channel --channel=page xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600
xymon     1697  0.0  0.0 6213976 8628 ?        S    Dec14   0:49 xymond_channel --channel=client xymond_client
xymon     1698  0.0  0.0 6213804 8704 ?        S    Dec14   1:17 xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon     1699  0.0  0.0 6211864 2084 ?        S    Dec14   0:12 xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon     1700  0.0  0.0 6212580 8392 ?        S    Dec14   0:00 xymond_channel --channel=clichg xymond_hostdata
xymon     1743  0.1 31.9 6315044 5200828 ?     S    Dec14   1:46 xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon     1744  0.1 31.9 6218584 5194192 ?     S    Dec14   1:53 xymond_client
xymon     1745  0.0  0.4 5235992 74140 ?       S    Dec14   0:00 xymond_history
xymon     1746  0.0  0.3 5229068 48912 ?       S    Dec14   0:05 xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600
xymon     1747  0.0  6.8 6306576 1121700 ?     S    Dec14   0:32 xymond_rrd --rrddir=/var/lib/xymon/rrd
xymon     2213  0.0  0.0 5225548 15060 ?       S    Dec14   0:00 xymond_hostdata
xymon    13022  0.0  0.0 116340  3024 pts/0    S    10:53   0:00 -bash
xymon    14689  0.0  0.0 113420  1568 ?        S    10:59   0:00 /bin/sh /usr/share/xymon/ext/ntpd.sh
xymon    14822  0.0  0.0   9568  1140 ?        S    10:59   0:00 /bin/sh
xymon    14823  0.0  0.0   9568  1136 ?        S    10:59   0:00 /bin/sh
xymon    14828  0.0  0.0  49016  1264 ?        S    10:59   0:00 vmstat 300 2
xymon    14829  0.0  0.0  49016  1264 ?        S    10:59   0:00 vmstat 300 2
xymon    15680  0.0  0.0 113420   720 ?        S    11:02   0:00 /bin/sh /usr/share/xymon/ext/ntpd.sh
xymon    15681  0.0  0.0  23652  1504 ?        S    11:02   0:00 /usr/sbin/ntpdate -t 1 -p 5 -u -q <server>


>3. Play with combo message sizes. Perhaps a smaller size would help. You can set MAXMSGSPERCOMBO in xymonserver.cfg.
>6. Profile the xymond process's memory usage. I'm not sure how to do this. Perhaps you can get it to dump core, then analyse the core (perhaps just run "strings" over it) to see what's using up all the memory. Perhaps there's some gdb techniques for this.
>7. Try running xymongen without "--report", and xymonnet with "--bfq" or "--no-bfq".

I’ll play around with combomsg sizes and try omitting the reports – doing a bt full, I get this output:

Reading symbols from /usr/libexec/xymon/xymonnet...Reading symbols from /usr/lib/debug/usr/libexec/xymon/xymonnet.debug...done.
done.
[New LWP 15566]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `xymonnet --report --ping --checkresponse --dns-timeout=3 --dnslog=/var/log/xymo'.
Program terminated with signal 6, Aborted.
#0  0x00007f7484027387 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt full
#0  0x00007f7484027387 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
        resultvar = 0
        pid = 15566
        selftid = 15566
#1  0x00007f7484028a78 in __GI_abort () at abort.c:90
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 47120176, sa_restorer = 0x0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000000000422d95 in sigsegv_handler (signum=<optimized out>) at sig.c:57
No locals.
#3  <signal handler called>
No locals.
#4  strbuf_addtobuffer (buf=0x0, newtext=0x2ceff30 "extcombo", ' ' <repeats 192 times>..., newlen=2000) at strfunc.c:115
No locals.
#5  0x0000000000424635 in addtobufferraw (buf=<optimized out>, newdata=<optimized out>, bytes=<optimized out>) at strfunc.c:184
No locals.
#6  0x000000000042d9b2 in combo_start () at sendmsg.c:908
No locals.
#7  0x00000000004064dc in main (argc=6, argv=0x7ffd5619f5b8) at xymonnet.c:2554
        msg = "PING test completed (1913 hosts)", '\000' <repeats 479 times>
        handle = <optimized out>
        s = <optimized out>
        h = <optimized out>
       t = <optimized out>
        argi = <optimized out>
        concurrency = <optimized out>
        pingcolumn = <optimized out>
        egocolumn = <optimized out>
        failgoesclear = <optimized out>
        dumpdata = <optimized out>
        runtimewarn = 300
        servicedumponly = <optimized out>
        pingrunning = 1
        usebackfeedqueue = 0
        force_backfeedqueue = <optimized out>
        network_count = <optimized out>


-      Which looks like the report part?

My xymond.chk file is 55 MB – is that an issue?

Regards,

Carl


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20201215/68998fd7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 118 bytes
Desc: image001.gif
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20201215/68998fd7/attachment.gif>


More information about the Xymon mailing list