[Xymon] Could not fork checkpoint child:Cannot allocate memory
Raul GN
ragonlan at gmail.com
Tue Jul 22 10:09:49 CEST 2014
Yes, I can provide you all information you need. Xymon was upgraded on 15
July and this is memory graph:
[image: Inline image 1]
Xymon is instaled in Debian 6.0.9.
This are our ulimit configuration. We didn't change it so it should be
defaults values:
-----------------------------------------------------------------------
root: #ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63975
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63975
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
xymon: $ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) 0
memory(kbytes) unlimited
locked memory(kbytes) 64
process 63975
nofiles 1024
vmemory(kbytes) unlimited
locks unlimited
------------------------------------------------------------
We monitor 1200 hosts and don't add or remove many hosts. May be 20 host a
week:
[image: Inline image 2]
xymond test is:
----------------------------------------------------------------
Statistics for Xymon daemon
Version: 4.3.17
Up since 21-Jul-2014 11:37:37 (0 days, 22:00:00)
Incoming messages : 4335475
- status : 2251363
- combo : 24050
- extcombo : 104869
- page : 160
- summary : 0
- data : 1354532
- client : 92341
- notes : 0
- enable : 0
- disable : 0
- ack : 2
- config : 5968
- query : 4471
- xymondboard : 146651
- xymondlog : 341645
- drop : 3
- rename : 0
- dummy : 328
- ping : 0
- notify : 0
- schedule : 298
- download : 0
- Bogus/Timeouts : 8794
Incoming messages/sec : 52 (average last 300 seconds)
status channel messages: 2244782 (1 readers)
stachg channel messages: 10913 (1 readers)
page channel messages: 49720 (1 readers)
data channel messages: 1349077 (1 readers)
notes channel messages: 0 (0 readers)
enadis channel messages: 0 (0 readers)
client channel messages: 90465 (1 readers)
clichg channel messages: 360 (1 readers)
user channel messages: 0 (0 readers)
backfeed messages : 0
Ghost reports:
10.6.71.66 reported host
10.6.42.103 reported host 10.6.42.10
10.6.42.103 reported host 10.6.42.11
10.6.42.103 reported host 10.6.42.12
10.6.42.103 reported host 10.6.42.13
10.6.42.103 reported host 10.6.42.14
10.6.42.103 reported host 10.6.42.15
10.6.42.103 reported host 10.6.42.4
10.6.42.103 reported host 10.6.42.5
10.1.0.194 reported host ePagess_app_3
10.1.0.86 reported host IIS6_6_1
10.1.0.87 reported host IIS6_6_2
10.1.0.194 reported host main0404
Multi-source statuses
admin01:conn reported by 10.6.42.103 and 10.0.0.29
----------------------------------------------------------------------------------------------------
And xymond section in task.cfg:
[xymond]
ENVFILE /usr/lib/xymon/server/etc/xymonserver.cfg
CMD xymond --pidfile=$XYMONSERVERLOGS/xymond.pid \
--restart=$XYMONTMP/xymond.chk
--checkpoint-file=$XYMONTMP/xymond.chk --checkpoint-interval=600 \
--log=$XYMONSERVERLOGS/xymond.log \
--admin-senders=127.0.0.1,$XYMONSERVERIP \
--store-clientlogs=!msgs \
--maint-senders=127.0.0.1,$XYMONSERVERIP \
--www-senders=127.0.0.1,10.0.0.0/24,10.6.42.103 \
--flap-count=10 \
--flap-seconds=900
We also change some MAX variables in xymonserver.cfg:
MAXLINE="32768"
MAXMSG_DATA="5242880"
MAXMSG_CLIENT="5242880"
MAXMSG_STATUS="5242880"
Thank you for you help.
On Mon, Jul 21, 2014 at 7:12 PM, J.C. Cleaver <cleaver at terabithia.org>
wrote:
>
>
> On Mon, July 21, 2014 3:24 am, Raul GN wrote:
> > After upgrading from xymon version 4.3.12 to 4.3.17 xymond daemon memory
> > grow without any limit. After a 2 or 3 days this messages appear in logs:
> >
> > 2014-07-17 16:27:46 Setup complete
> > 2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in
> 868
> > seconds
> > 2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in
> > 868 seconds
> > 2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in
> 892
> > seconds
> > 2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in
> > 892 seconds
> > 2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in
> 808
> > seconds
> > 2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in
> > 808 seconds
> > 2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate
> memory
> > 2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate
> memory
> >
> > żAnybody knows how to avoid this problem? If I don't reboot xymon it
> > crash.
> > _______________________________________________
>
>
> Hmm. If it's growing truly without limit there's something unusual going
> on; I'd take the memory allocation error later on at face value.
>
> Can you provide any additional details? Do you have an unusual workload or
> ulimits on the xymon user? Or a large number of host inserts/removals?
> What OS are you running?
>
>
> Regards,
>
> -jc
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20140722/b6156054/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Monitored items.png
Type: image/png
Size: 16138 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20140722/b6156054/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Memory.png
Type: image/png
Size: 34905 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20140722/b6156054/attachment-0001.png>
More information about the Xymon
mailing list