[Xymon] rrd logs and graphs

Jeremy Laidman jlaidman at rebel-it.com.au
Tue Feb 24 07:26:57 CET 2015


I'm assuming you've checked your debug output from your script to see if
the $TEMPFILE.* file contents look OK.

Perhaps run your own instance of "xymond_channel --channel=data" to capture
the messages as they come from xymond to xymond_rrd.  This will generate a
lot of output, so you'll want to use "--filter" and perhaps "grep" to trim
it down.

You could also run snoop/tcpdump at the same time and try to capture the
data message as it arrives at your Xymon server.  If you have lots of Xymon
traffic it might be better to do so on the client side.

The trick is to get a snapshot at the time that the RRD file is created,
without collecting so much data that you run out of disk!  So doing things
like this:

while true; do tcpdump -w dump.out -n -c 10000 dest port 1984 and host
blabla; gzip dump.out; mv dump.out.gz dump.out-`date +%s`; done

This will capture 10k of packets at a time, then compress and rotate.

You can also run xymond in a host-specific debug mode, by appending
"--dbghost=HOSTNAME".  That will spit out all the traffic into
/tmp/xymond.dbg for analysis.  Again, you might need to periodically rotate
that file and signal xymond to re-open output files (I'm guessing a HUP
signal might do this, or just kill the process and have xymonlaunch restart
it).

The path the data take would be:

[script] -> [xymon client] -> [TCP/1984] -> [xymond] -> [xymond_channel] ->
[xymond_rrd] -> [rrd file]

What we want to do is to watch the traffic/messages to determine which of
these components is causing the problem.  My first step would be to try to
isolate whether it's a client or server problem, hence watching the traffic
with tcpdump/snoop.  If the traffic is transmitted over the wire in the
correct form, then I'd look at what xymond gives to xymond_channel.  And so
on.  Once we can identify the process that creates the phantom entity, we
can look for the root cause and then work-arounds/solutions.

J


On 24 February 2015 at 16:46, Vernon Everett <everett.vernon at gmail.com>
wrote:

> I am getting those sporadic .rrd files in spades. :-(
> Sometimes, only a single data point in the file. But enough files, and
> your graphs start to look like crap.
>
> Tomorrow I am off to a client where it's happening all the time.
> What can I send you to assist with investigating?
>
> I am trying to figure out if it's a bug in Xymon, or a bug in my script.
> So far I have no evidence to support it being either.
>
> Regards
> Vernon
>
>
>
>
> On 24 February 2015 at 13:14, Jeremy Laidman <jlaidman at rebel-it.com.au>
> wrote:
>
>> On 14 November 2014 at 14:43, Vernon Everett <everett.vernon at gmail.com>
>> wrote:
>>
>>> Am busy trying to investigate a curious problem with rrd graphs, and I
>>> stumbled on something else I don't understand, and was hoping somebody out
>>> there could help.
>>>
>>> As part of my investigation, I added --debug to the [rrdstatus] and
>>> [rrddata] entries on the server tasks.cfg
>>> And the logs started showing heaps of the message
>>> 2014-11-14 10:41:36 Peer not up, flushing message queue
>>> What is that?
>>> It doesn't look right to me.
>>>
>>
>> It's usually normal.  See Henrik's response to a similar question:
>>
>> http://lists.xymon.com/archive/2014-April/039461.html
>>
>> Except every now and then, I get something like
>>> zmem,c2t0d1.rrd
>>>
>>
>>
>>> Has anybody seen anything like this?
>>>
>>
>> Yes.  It's puzzling, but rare enough that I haven't had time to
>> investigate.
>>
>> J
>>
>>
>
>
> --
> "Accept the challenges so that you can feel the exhilaration of victory"
> - General George Patton
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150224/da87a121/attachment.html>


More information about the Xymon mailing list