[Xymon] Flushing Stale messages?

cleaver at terabithia.org cleaver at terabithia.org
Fri Mar 15 18:19:27 CET 2013


Yeah, that generally means your pipe has backed up too much.

"Rate of messages" is a good metric to keep track of (visible at 5m
intervals from the xymond status report). If you're getting 3000 messages
every 300 seconds, that's 0.1s you've got to process each message coming
in on average, but subject to expected spikes and the buffers running
over.


Depending on what you're doing, smoothing out how often you're getting
messages to reduce spikes will help, as will filtering at xymond_channel
if you're only interesting in a subset, along with (obviously) trying to
make the message processor more efficient.

Eventually, it could lead to forking off the handling (if you can do it
efficiently and have cores to spare), or using an async queue somewhere.


On the second part, that's interesting... Can you provide a sample msg
with a null?


Regards,

-jc



--- Original Message ---

I'll answer that myself – yes that means whatever is there can't process
the channel fast enough


So, I'll have to go back to my older parser – which is getting this:


Core was generated by `xymond_mysql
--pidfile=/var/log/xymon/xymond_history.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140
140	xymond_channel.c: No such file or directory.
in xymond_channel.c
(gdb) where
#0  0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140
#1  0x00511e9c in ?? ()
#2  0x004f8ca0 in ?? () from /lib/ld-linux.so.2
#3  0x08057190 in stackfgets (buffer=0x80497b0, extraincl=0x2 <Address 0x2
out of bounds>) at stackio.c:434
#4  0x080496c1 in _start ()


Which is getting a null timestamp for some items on stachg channel :/






From: <Clark>, Sean Clark <sean.clark at twcable.com>
Date: Friday, March 15, 2013 11:21 AM
To: "xymon at xymon.com" <xymon at xymon.com>
Subject: [Xymon] Flushing Stale messages?



I have a channel parser than looks at items in the 'stachg' channel

It looks like it's working for me (it parses and does stuff properly)

However – my log is filling up with this:



2013-03-15 11:08:29 Flushed 4 stale messages for 0.0.0.0:0
2013-03-15 11:08:30 Flushed 4 stale messages for 0.0.0.0:0
2013-03-15 11:08:31 Flushed 3 stale messages for 0.0.0.0:0
2013-03-15 11:08:32 Flushed 6 stale messages for 0.0.0.0:0
2013-03-15 11:08:33 Flushed 2 stale messages for 0.0.0.0:0
2013-03-15 11:08:34 Flushed 2 stale messages for 0.0.0.0:0
2013-03-15 11:08:35 Flushed 3 stale messages for 0.0.0.0:0
2013-03-15 11:08:36 Flushed 3 stale messages for 0.0.0.0:0
2013-03-15 11:08:37 Flushed 4 stale messages for 0.0.0.0:0
2013-03-15 11:08:38 Flushed 4 stale messages for 0.0.0.0:0


Is this telling my my parse can not handle the channel in a timely manner,
and the message is growing "stale" and I am droping things?



-Sean






More information about the Xymon mailing list