[Xymon] Possible Memory Leak (?!) in Version Xymon 4.3.27-1.el6.terabithia
Henrik Størner
henrik at hswn.dk
Sat Sep 24 14:18:34 CEST 2016
Hi,
memory leaks are the worst to troubleshoot.
If possible, then running xymond_rrd via the "valgrind" tool is the best
way to do it. valgrind comes with some distributions, not sure about
RHEL though. There might be some CentOS packages that will work.
An important point is that the binaries must be compiled with debugging
info intact; i.e. "-g" as a compile-time option, preferably only -O
optimisation, and not stripped. I guess Japheth can help you with that,
if necessary.
Then you change the tasks.cfg to run xymond_rrd via valgrind: The CMD
setting must then be
CMD valgrind --log-file=/tmp/valgrind-rrd.%p --leak-check=full \
xymond_channel --channel=status
--log=$XYMONSERVERLOGS/rrd-status.log xymond_rrd --rrddir=$XYMONVAR/rrd
Then run Xymon normally for some time, until hopefully it starts logging
memory leaks.
This checking does have a significant performance impact, so running it
on a 4000-server system is probably not possible.
Regards,
Henrik
Den 23-09-2016 kl. 13:38 skrev Peter Welter:
> Hi Japheth,
>
> Probable one process (xymon_rrd) seems very hungry for memory:
>
> [xymon]# ps aux | egrep 'xymon|MEM'
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>
> xymon 16889 0.0 0.0 4176 604 ? S 13:26 0:00 /bin/dash
>
> xymon 16892 0.0 0.0 6272 660 ? S 13:26 0:00 vmstat
> 300 2
>
> xymon 16986 0.0 0.0 4176 600 ? S 13:28 0:00 /bin/dash
>
> xymon 16989 0.0 0.0 6272 664 ? S 13:28 0:00 vmstat
> 300 2
>
> xymon 17060 0.0 0.0 4176 604 ? S 13:30 0:00 /bin/dash
>
> xymon 17063 0.0 0.0 6272 664 ? S 13:30 0:00 vmstat
> 300 2
>
> xymon 17107 0.5 0.1 140340 10324 ? S 13:31 0:00
> /usr/bin/perl -w -I/home/bbtest/server/ext
> /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>
>
> xymon 17110 0.2 0.1 142236 11108 ? S 13:31 0:00
> /usr/bin/perl -w -I/home/bbtest/server/ext
> /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>
>
> xymon 17160 0.0 0.0 106120 1248 ? S 13:31 0:00 sh -c
> /usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1
>
> xymon 17161 0.0 0.0 60060 3440 ? S 13:31 0:00
> /usr/bin/ssh -x -l xymon 10.10.1.30 environment status
>
> root 17163 0.0 0.0 103324 852 pts/1 S+ 13:31 0:00 egrep
> xymon|MEM
>
> xymon 27932 0.0 0.0 12648 592 ? Ss Sep20 0:05
> /usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log
>
> xymon 27992 0.0 0.1 25212804 8160 ? S Sep20 1:57
> xymond --restart=/var/lib/xymon/tmp/xymond.chk
> --checkpoint-file=/var/lib/xymon/tmp/xymond.chk
> --checkpoint-interval=600 --admin-senders=127.0.0.1,132.229.61.140
> --store-clientlogs=!msgs
>
> xymon 27996 0.0 0.0 12624444 1452 ? S Sep20 0:00
> xymond_channel --channel=stachg xymond_history
>
> xymon 27997 0.0 0.0 12624444 1244 ? S Sep20 0:00
> xymond_channel --channel=page xymond_alert
> --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600
>
> xymon 27998 0.0 0.0 12624444 1340 ? S Sep20 0:00
> xymond_channel --channel=client xymond_client
>
> xymon 27999 0.0 0.0 12624860 4328 ? S Sep20 0:02
> xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd
>
> xymon 28000 0.0 0.0 12625628 4712 ? S Sep20 0:00
> xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd
>
> xymon 28001 0.0 0.0 12624444 1320 ? S Sep20 0:00
> xymond_channel --channel=clichg xymond_hostdata
>
> xymon 28007 0.0 0.0 41788 1168 ? S Sep20 0:00
> xymond_channel --channel=user --log=/var/log/xymon/vmware-monitord.log
> vmware-monitord
>
> xymon 28008 0.0 0.0 10527268 1688 ? S Sep20 0:00
> xymond_history
>
> xymon 28009 0.0 1.5 12624884 122508 ? S Sep20 0:00
> xymond_client
>
> xymon 28010 0.0 0.0 106848 2176 ? S Sep20 0:00
> /bin/gawk -f /usr/libexec/xymon/vmware-monitord
>
> xymon 28011 0.0 0.0 10527252 1212 ? S Sep20 0:00
> xymond_hostdata
>
> *xymon 28012 0.0 9.4 12680832 765216 ? S Sep20 0:08
> xymond_rrd --rrddir=/var/lib/xymon/rrd*
>
> *xymon 28013 0.0 12.1 12689484 975908 ? S Sep20 0:12
> xymond_rrd --rrddir=/var/lib/xymon/rrd*
>
> xymon 28014 0.0 0.1 10527512 9980 ? S Sep20 0:00
> xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
> --checkpoint-interval=600
>
> I did one test migration, were all hosts (about 4000 hosts) ran on
> this system. So the directory /var/lib/xymon/rrd is quite huge.
> However, currently there is only one host (xymon server itself)
> running and it is testing one netapp filer. So perhaps, xymon_rrd and
> this large directory are somehow related. I will have a try on the
> Accept environment which I have installed by now. There are just a few
> files in /var/lib/xymon/rrd on this Accept system, and I check next
> monday how each system will behave.
>
> <So far an update; will be continued. next week..>
>
>
> 2016-09-21 13:18 GMT+02:00 Peter Welter <peter.welter at gmail.com
> <mailto:peter.welter at gmail.com>>:
>
> Hi Japheth,
>
> Thanks for your response. I'm looking into this and will be back
> a.s.a.p. (a few days or so, since I just restarted Xymon ;-)
>
> Peter
>
> 2016-09-20 19:07 GMT+02:00 Japheth Cleaver <cleaver at terabithia.org
> <mailto:cleaver at terabithia.org>>:
>
> On 9/20/2016 8:37 AM, Peter Welter wrote:
>
> Hi J.C.,
>
> First of all: Thanks for your work for Xymon!
>
> Second: I have a question about the repository from
> terabithia. I want to install an Development, Test
> Accept, Production environment with the use of this
> repository. I installed first and are working on the next
> phase.
>
> Over time however, I see that my Xymon-server seems to eat
> all the memory available and starts swapping until all
> memory is consumed?!?
>
> This is for Development only and there are no really any
> tests. A very small host.cfg. So, why is over time, Xymon
> this hungry for memory?
>
> Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL
>
> Memory Used Total Percentage
> green Real/Physical 7737M 7872M 98%
> yellow Actual/Virtual 7539M 7872M 95%
> red Swap/Page 3886M 4095M 94%
>
> After a Xymon restart, all the swap is freed?
>
> I'm using Red Hat Enterprise Linux Server release 6.8
> (Santiago)
>
> Any suggestions what to do next? Thanks in advance for any
> help!
>
> Peter
>
>
> Hi Peter,
>
> I'm not aware of any memory leaks present in 4.3.27 itself
> that would cause growth like that. Can you provide the ps
> output for the system's various xymon tools? Which process
> seems to be running out of control?
>
> -jc
>
>
>
>
>
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160924/9634b738/attachment.html>
More information about the Xymon
mailing list