[Xymon] Possible Memory Leak (?!) in Version Xymon 4.3.27-1.el6.terabithia
Japheth Cleaver
cleaver at terabithia.org
Wed Sep 28 18:58:18 CEST 2016
Hi,
There's no need to rebuild the packages to enable this type of testing.
Just make sure the xymon-debuginfo RPM is installed (it's in the same
repo), as that contains all of the symbol information on RH-type systems.
As far as valgrind, all you really need is the base 'valgrind' package.
Simply modify the tasks.cfg as below and you should be set. I also use
"--track-origins=yes" typically.
In terms of the overall problem, xymond_rrd will use a larg(ish) amount
of RAM as it spools up its cache of data points before sending them out
to rrdtool itself for writing. In theory, this should hit a constant
level once it's been running for an hour or two (depending on your
datapoints and hosts) and shouldn't grow beyond that. The overall memory
usage will scale linearly with host x RRAs.
I know it had been a source of leaks before, so it's possible something
is still in there. Are you adding and removing lots of hosts at once by
any chance? It's possible there's an incorrect cleanup of previously
cached data, but I'd thought those had been resolved.
HTH,
-jc
On 9/28/2016 1:27 AM, Peter Welter wrote:
>
> Hi Henrik, J.C.,
>
> Thanks for your response.
>
> It seems that valgrind is available for RHEL (see below) and now I
> wanted to ask J.C. the following: "What do you want me to do?"
>
> If I want to use the prebuild packages, and YES that would be
> preferable, then can you supply me with a pre-compiles binary for
> xymond_rrd that has all the options Henrik talked about? So I can
> replace this with the currently installed image?
>
> Or should I build a package my self to debug this issue?
>
> Regards, Peter
>
>
> [root at uhu-a xymon]# yum search valgrind
>
> Loaded plugins: product-id, search-disabled-repos, security,
> subscription-manager
>
> ==================================================================================================================================
> N/S Matched: valgrind
> ==================================================================================================================================
>
> devtoolset-1.1-*valgrind*-devel.i686 : Development files for *valgrind*
>
> devtoolset-1.1-*valgrind*-devel.x86_64 : Development files for *valgrind*
>
> devtoolset-1.1-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*
>
> devtoolset-1.1-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*
>
> devtoolset-2-eclipse-*valgrind*.noarch : *Valgrind* Tools Integration
> for Eclipse
>
> devtoolset-2-*valgrind*-devel.i686 : Development files for *valgrind*
>
> devtoolset-2-*valgrind*-devel.x86_64 : Development files for *valgrind*
>
> devtoolset-2-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*
>
> devtoolset-2-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*
>
> eclipse-*valgrind*.x86_64 : *Valgrind* Tools Integration for Eclipse
>
> perl-Test-*Valgrind*.noarch : Generate suppressions, analyze and test
> any command with *valgrind*
>
> *valgrind*-devel.i686 : Development files for *valgrind*
>
> *valgrind*-devel.x86_64 : Development files for *valgrind*
>
> *valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*
>
> devtoolset-1.1-*valgrind*.i686 : Tool for finding memory management
> bugs in programs
>
> devtoolset-1.1-*valgrind*.x86_64 : Tool for finding memory management
> bugs in programs
>
> devtoolset-2-*valgrind*.i686 : Tool for finding memory management bugs
> in programs
>
> devtoolset-2-*valgrind*.x86_64 : Tool for finding memory management
> bugs in programs
>
> *valgrind*.i686 : Tool for finding memory management bugs in programs
>
> *valgrind*.x86_64 : Tool for finding memory management bugs in programs
>
> valkyrie.x86_64 : Graphical User Interface for *Valgrind* Suite
>
>
> Name and summary matches *only*, use "search all" for everything.
>
>
> 2016-09-24 14:18 GMT+02:00 Henrik Størner <henrik at hswn.dk
> <mailto:henrik at hswn.dk>>:
>
> Hi,
>
> memory leaks are the worst to troubleshoot.
>
> If possible, then running xymond_rrd via the "valgrind" tool is
> the best way to do it. valgrind comes with some distributions, not
> sure about RHEL though. There might be some CentOS packages that
> will work.
>
> An important point is that the binaries must be compiled with
> debugging info intact; i.e. "-g" as a compile-time option,
> preferably only -O optimisation, and not stripped. I guess Japheth
> can help you with that, if necessary.
>
> Then you change the tasks.cfg to run xymond_rrd via valgrind: The
> CMD setting must then be
>
> CMD valgrind --log-file=/tmp/valgrind-rrd.%p --leak-check=full \
> xymond_channel --channel=status
> --log=$XYMONSERVERLOGS/rrd-status.log xymond_rrd
> --rrddir=$XYMONVAR/rrd
>
> Then run Xymon normally for some time, until hopefully it starts
> logging memory leaks.
>
>
> This checking does have a significant performance impact, so
> running it on a 4000-server system is probably not possible.
>
>
> Regards,
> Henrik
>
>
>
> Den 23-09-2016 kl. 13:38 skrev Peter Welter:
>> Hi Japheth,
>>
>> Probable one process (xymon_rrd) seems very hungry for memory:
>>
>> [xymon]# ps aux | egrep 'xymon|MEM'
>>
>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>>
>> xymon 16889 0.0 0.0 4176 604 ? S 13:26 0:00
>> /bin/dash
>>
>> xymon 16892 0.0 0.0 6272 660 ? S 13:26 0:00
>> vmstat 300 2
>>
>> xymon 16986 0.0 0.0 4176 600 ? S 13:28 0:00
>> /bin/dash
>>
>> xymon 16989 0.0 0.0 6272 664 ? S 13:28 0:00
>> vmstat 300 2
>>
>> xymon 17060 0.0 0.0 4176 604 ? S 13:30 0:00
>> /bin/dash
>>
>> xymon 17063 0.0 0.0 6272 664 ? S 13:30 0:00
>> vmstat 300 2
>>
>> xymon 17107 0.5 0.1 140340 <tel:140340> 10324 ? S
>> 13:31 0:00 /usr/bin/perl -w -I/home/bbtest/server/ext
>> /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>
>>
>> xymon 17110 0.2 0.1 142236 11108 ? S 13:31 0:00
>> /usr/bin/perl -w -I/home/bbtest/server/ext
>> /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>
>>
>> xymon 17160 0.0 0.0 106120 1248 ? S 13:31 0:00 sh
>> -c /usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1
>>
>> xymon 17161 0.0 0.0 60060 3440 ? S 13:31 0:00
>> /usr/bin/ssh -x -l xymon 10.10.1.30 environment status
>>
>> root 17163 0.0 0.0 103324 852 pts/1 S+ 13:31 0:00
>> egrep xymon|MEM
>>
>> xymon 27932 0.0 0.0 12648 592 ? Ss Sep20 0:05
>> /usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log
>>
>> xymon 27992 0.0 0.1 25212804 8160 ? S Sep20 1:57
>> xymond --restart=/var/lib/xymon/tmp/xymond.chk
>> --checkpoint-file=/var/lib/xymon/tmp/xymond.chk
>> --checkpoint-interval=600
>> --admin-senders=127.0.0.1,132.229.61.140 --store-clientlogs=!msgs
>>
>> xymon 27996 0.0 0.0 12624444 1452 ? S Sep20 0:00
>> xymond_channel --channel=stachg xymond_history
>>
>> xymon 27997 0.0 0.0 12624444 1244 ? S Sep20 0:00
>> xymond_channel --channel=page xymond_alert
>> --checkpoint-file=/var/lib/xymon/tmp/alert.chk
>> --checkpoint-interval=600
>>
>> xymon 27998 0.0 0.0 12624444 1340 ? S Sep20 0:00
>> xymond_channel --channel=client xymond_client
>>
>> xymon 27999 0.0 0.0 12624860 4328 ? S Sep20 0:02
>> xymond_channel --channel=status xymond_rrd
>> --rrddir=/var/lib/xymon/rrd
>>
>> xymon 28000 0.0 0.0 12625628 4712 ? S Sep20 0:00
>> xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd
>>
>> xymon 28001 0.0 0.0 12624444 1320 ? S Sep20 0:00
>> xymond_channel --channel=clichg xymond_hostdata
>>
>> xymon 28007 0.0 0.0 41788 1168 ? S Sep20 0:00
>> xymond_channel --channel=user
>> --log=/var/log/xymon/vmware-monitord.log vmware-monitord
>>
>> xymon 28008 0.0 0.0 10527268 1688 ? S Sep20 0:00
>> xymond_history
>>
>> xymon 28009 0.0 1.5 12624884 122508 ? S Sep20 0:00
>> xymond_client
>>
>> xymon 28010 0.0 0.0 106848 2176 ? S Sep20 0:00
>> /bin/gawk -f /usr/libexec/xymon/vmware-monitord
>>
>> xymon 28011 0.0 0.0 10527252 1212 ? S Sep20 0:00
>> xymond_hostdata
>>
>> *xymon 28012 0.0 9.4 12680832 765216 ? S Sep20 0:08
>> xymond_rrd --rrddir=/var/lib/xymon/rrd*
>>
>> *xymon 28013 0.0 12.1 12689484 975908 ? S Sep20 0:12
>> xymond_rrd --rrddir=/var/lib/xymon/rrd*
>>
>> xymon 28014 0.0 0.1 10527512 9980 ? S Sep20 0:00
>> xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
>> --checkpoint-interval=600
>>
>> I did one test migration, were all hosts (about 4000 hosts) ran
>> on this system. So the directory /var/lib/xymon/rrd is quite
>> huge. However, currently there is only one host (xymon server
>> itself) running and it is testing one netapp filer. So perhaps,
>> xymon_rrd and this large directory are somehow related. I will
>> have a try on the Accept environment which I have installed by
>> now. There are just a few files in /var/lib/xymon/rrd on this
>> Accept system, and I check next monday how each system will behave.
>>
>> <So far an update; will be continued. next week..>
>>
>>
>> 2016-09-21 13:18 GMT+02:00 Peter Welter <peter.welter at gmail.com
>> <mailto:peter.welter at gmail.com>>:
>>
>> Hi Japheth,
>>
>> Thanks for your response. I'm looking into this and will be
>> back a.s.a.p. (a few days or so, since I just restarted Xymon ;-)
>>
>> Peter
>>
>> 2016-09-20 19:07 GMT+02:00 Japheth Cleaver
>> <cleaver at terabithia.org <mailto:cleaver at terabithia.org>>:
>>
>> On 9/20/2016 8:37 AM, Peter Welter wrote:
>>
>> Hi J.C.,
>>
>> First of all: Thanks for your work for Xymon!
>>
>> Second: I have a question about the repository from
>> terabithia. I want to install an Development, Test
>> Accept, Production environment with the use of this
>> repository. I installed first and are working on the
>> next phase.
>>
>> Over time however, I see that my Xymon-server seems
>> to eat all the memory available and starts swapping
>> until all memory is consumed?!?
>>
>> This is for Development only and there are no really
>> any tests. A very small host.cfg. So, why is over
>> time, Xymon this hungry for memory?
>>
>> Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL
>>
>> Memory Used Total Percentage
>> green Real/Physical 7737M 7872M 98%
>> yellow Actual/Virtual 7539M 7872M 95%
>> red Swap/Page 3886M 4095M 94%
>>
>> After a Xymon restart, all the swap is freed?
>>
>> I'm using Red Hat Enterprise Linux Server release 6.8
>> (Santiago)
>>
>> Any suggestions what to do next? Thanks in advance
>> for any help!
>>
>> Peter
>>
>>
>> Hi Peter,
>>
>> I'm not aware of any memory leaks present in 4.3.27
>> itself that would cause growth like that. Can you provide
>> the ps output for the system's various xymon tools? Which
>> process seems to be running out of control?
>>
>> -jc
>>
>>
>>
>>
>>
>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com <mailto:Xymon at xymon.com>
>> http://lists.xymon.com/mailman/listinfo/xymon
>> <http://lists.xymon.com/mailman/listinfo/xymon>
> _______________________________________________ Xymon mailing list
> Xymon at xymon.com <mailto:Xymon at xymon.com>
> http://lists.xymon.com/mailman/listinfo/xymon
> <http://lists.xymon.com/mailman/listinfo/xymon>
>
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160928/1d7c6c7c/attachment.html>
More information about the Xymon
mailing list