[Xymon] rrd logs and graphs

Vernon Everett everett.vernon at gmail.com
Wed Feb 25 04:40:58 CET 2015


Hi Jeremy

Added some debug code to my script. Here's an extract.
      DATA=$(cat $TMPFILE.drvperf | awk '{ print $1" : "$2 }') # Current IO
latency
      $XYMON $XYMSRV "data $ENAME.e-series-dcuriolat $(echo; echo; echo
"$DATA"; echo)"
echo      $XYMON $XYMSRV "data $ENAME.e-series-dcuriolat $(echo; echo; echo
"$DATA"; echo)"
      DATA=$(cat $TMPFILE.drvperf | awk '{ print $1" : "$3 }') # Max IO
latency
      $XYMON $XYMSRV "data $ENAME.e-series-dmaxiolat $(echo; echo; echo
"$DATA"; echo)"
echo      $XYMON $XYMSRV "data $ENAME.e-series-dmaxiolat $(echo; echo; echo
"$DATA"; echo)"
      DATA=$(cat $TMPFILE.drvperf | awk '{ print $1" : "$3 }') # Avg IO
latency
      $XYMON $XYMSRV "data $ENAME.e-series-davgiolat $(echo; echo; echo
"$DATA"; echo)"
echo      $XYMON $XYMSRV "data $ENAME.e-series-davgiolat $(echo; echo; echo
"$DATA"; echo)"

And I managed to get a couple of bizarre data files.
e-series-dcuriolat,icmpOutParmProbs.rrd
e-series-dcuriolat,icmpOutRedirects.rrd
e-series-dcuriolat,ipv6InTruncatedPkts.rrd
e-series-dcuriolat,ipv6OutFragFails.rrd
e-series-dcuriolat,UDP_udpInDatagrams.rrd
e-series-dcuriolat,udpInCksumErrs.rrd

And if I grep in my log file for icmp or any of those terms, I come up with
nothing.
So I am guessing it's not coming from the client.

I want to try the snoop, but this client script is running on the server,
as a client script.
It collects data from a bunch of NetApp E-series devices, and sends it to
the server in the normal way.
So you can imagine what the snoop data is going to look like.
But I will give it a go, and see if there is something in it.

As for debugging the rrd tasks, John was right.
Adding --debug to the rrd config causes it to crash.
Then I just het heaps of this.
2015-02-25 11:31:07 Peer not up, flushing message queue
2015-02-25 11:31:07 Peer not up, flushing message queue
2015-02-25 11:31:07 Peer not up, flushing message queue
And the occasional
19073 2015-02-25 11:31:14 2015-02-25 11:31:15 Child process 19073 died:
Signal 6

But I think I am reasonably happy that the strange data isn't coming from
the client script.
Martin Flemming is a list member in Germany (think) who is helping me test
this script.
I will ask him if he's seeing the same issues. If not, I think we can rule
out the script.

Regards
Vernon


On 24 February 2015 at 14:26, Jeremy Laidman <jlaidman at rebel-it.com.au>
wrote:

> I'm assuming you've checked your debug output from your script to see if
> the $TEMPFILE.* file contents look OK.
>
> Perhaps run your own instance of "xymond_channel --channel=data" to
> capture the messages as they come from xymond to xymond_rrd.  This will
> generate a lot of output, so you'll want to use "--filter" and perhaps
> "grep" to trim it down.
>
> You could also run snoop/tcpdump at the same time and try to capture the
> data message as it arrives at your Xymon server.  If you have lots of Xymon
> traffic it might be better to do so on the client side.
>
> The trick is to get a snapshot at the time that the RRD file is created,
> without collecting so much data that you run out of disk!  So doing things
> like this:
>
> while true; do tcpdump -w dump.out -n -c 10000 dest port 1984 and host
> blabla; gzip dump.out; mv dump.out.gz dump.out-`date +%s`; done
>
> This will capture 10k of packets at a time, then compress and rotate.
>
> You can also run xymond in a host-specific debug mode, by appending
> "--dbghost=HOSTNAME".  That will spit out all the traffic into
> /tmp/xymond.dbg for analysis.  Again, you might need to periodically rotate
> that file and signal xymond to re-open output files (I'm guessing a HUP
> signal might do this, or just kill the process and have xymonlaunch restart
> it).
>
> The path the data take would be:
>
> [script] -> [xymon client] -> [TCP/1984] -> [xymond] -> [xymond_channel]
> -> [xymond_rrd] -> [rrd file]
>
> What we want to do is to watch the traffic/messages to determine which of
> these components is causing the problem.  My first step would be to try to
> isolate whether it's a client or server problem, hence watching the traffic
> with tcpdump/snoop.  If the traffic is transmitted over the wire in the
> correct form, then I'd look at what xymond gives to xymond_channel.  And so
> on.  Once we can identify the process that creates the phantom entity, we
> can look for the root cause and then work-arounds/solutions.
>
> J
>
>
> On 24 February 2015 at 16:46, Vernon Everett <everett.vernon at gmail.com>
> wrote:
>
>> I am getting those sporadic .rrd files in spades. :-(
>> Sometimes, only a single data point in the file. But enough files, and
>> your graphs start to look like crap.
>>
>> Tomorrow I am off to a client where it's happening all the time.
>> What can I send you to assist with investigating?
>>
>> I am trying to figure out if it's a bug in Xymon, or a bug in my script.
>> So far I have no evidence to support it being either.
>>
>> Regards
>> Vernon
>>
>>
>>
>>
>> On 24 February 2015 at 13:14, Jeremy Laidman <jlaidman at rebel-it.com.au>
>> wrote:
>>
>>> On 14 November 2014 at 14:43, Vernon Everett <everett.vernon at gmail.com>
>>> wrote:
>>>
>>>> Am busy trying to investigate a curious problem with rrd graphs, and I
>>>> stumbled on something else I don't understand, and was hoping somebody out
>>>> there could help.
>>>>
>>>> As part of my investigation, I added --debug to the [rrdstatus] and
>>>> [rrddata] entries on the server tasks.cfg
>>>> And the logs started showing heaps of the message
>>>> 2014-11-14 10:41:36 Peer not up, flushing message queue
>>>> What is that?
>>>> It doesn't look right to me.
>>>>
>>>
>>> It's usually normal.  See Henrik's response to a similar question:
>>>
>>> http://lists.xymon.com/archive/2014-April/039461.html
>>>
>>> Except every now and then, I get something like
>>>> zmem,c2t0d1.rrd
>>>>
>>>
>>>
>>>> Has anybody seen anything like this?
>>>>
>>>
>>> Yes.  It's puzzling, but rare enough that I haven't had time to
>>> investigate.
>>>
>>> J
>>>
>>>
>>
>>
>> --
>> "Accept the challenges so that you can feel the exhilaration of victory"
>> - General George Patton
>>
>
>


-- 
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150225/717e979f/attachment.html>


More information about the Xymon mailing list