[Xymon] Flushing Stale messages?

Fri Mar 15 19:41:41 CET 2013

That's odd. If you're on a box with a lot of memory, writing out to a
tmpfs might help. For your worker, I'd suggest just adding a debug line or
two in front of that section.

WRT the checkpoint file, the only real corruption I've seen myself has
occurred when malformed utf-8 packets came in -- I'd accidentally included
gzip output in a script I'd put in my /local directory :/.

You could try modifying the init startup/shutdown script to copy over the
checkpoint file every once in a while, and then point a copy of xymond
over to it in --debug mode and see if it chokes... and if so, how far in.

Thinking about it, a --validate flag to xymond might not be too hard to
whip up.

Regards,

-jc

--- Original Message ---

Heh , I'd have to look at the whole stachg channel to find needle in
haystack for that

Got a couple (once every 2-3 day) core dumps here:

Program terminated with signal 11, Segmentation fault.
#0  main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371

xymond_mysql.c line 371:
   mysql_escape_string(timestamp,metadata[1],timestampbytes);
Timestampbytes is strln of timestamp

I am not strong in C , however, so to find that needle, I wrote a perl
version that pipes hist to mysql (that way, it logs exceptions etc etc),
However, the perl version can't handle the rate of messages (between
300-500/sec)

Bleh

What I STRONGLY need help with is my xymond.chk getting corrupted - henrik
looked at one a while back, and gave me something to look at/fix
Which I did, but it's still getting corrupted (and then any time it
crashes, lose all states)

Do you know of a good way to parse/manage the chk file to see what it
doesn't like?