[Xymon] rrd logs and graphs

Vernon Everett everett.vernon at gmail.com
Wed Mar 4 02:40:58 CET 2015


Hi all

Back at the customer's site, and back on this problem.

I just captured data as this happened.
Ran the command you suggested, slightly modified for our environment. I
also redirected it to a file for analysis.
Here's what I ran, with error output.
./xymoncmd xymond_channel  --channel=data --filter=e-series cat >
/var/tmp/xymon.out
2015-03-04 08:45:22 Using default environment file
/opt/local/xymon/server/etc/xymonserver.cfg
2015-03-04 08:45:58 Peer not up, flushing message queue
2015-03-04 09:05:21 Gave up waiting for GOCLIENT to go low.

What is that GOCLIENT thing?
It might be relevant, since it occurred just *after *some errant data files
were created. (Note timestamp)
The errant data files are
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:02
SINFSPNA01/e-series-davgiolat,subversionsize.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:02
SINFSPNA01/e-series-davgiolat,subversionrss.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:02
SINFSPNA01/e-series-davgiolat,subversionmemory.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:02
SINFSPNA01/e-series-davgiolat,subversioncpu.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:02
SINFSPNA01/e-series-davgiolat,subversion.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:02
SINFSPNA01/e-series-davgiolat,energisesize.rrd

This is supposed to graph the average IO latency of the disks in the
e-series, so we expect output to look like this.
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:24
SINFSPNA01/e-series-davgiolat,Tray99_Slot1.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:24
SINFSPNA01/e-series-davgiolat,Tray0_Slot1.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:22
SINFSPNA03/e-series-davgiolat,Tray99_Slot12.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:22
SINFSPNA03/e-series-davgiolat,Tray0_Slot12.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:21
SINFSPNA01/e-series-davgiolat,Tray99_Slot8.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:21
SINFSPNA01/e-series-davgiolat,Tray99_Slot7.rrd
-rw-r--r--   1 xymon    xymon      19580 Mar  4 09:21
SINFSPNA01/e-series-davgiolat,Tray99_Slot6.rrd

The subversion and energize are actually host names for completely
unrelated servers.
The subversionsize and subversionrsss and similar data points are being
sent for another host, but are not related to the e-series graphs.

In the output file, /var/tmp/xymon.out from
./xymoncmd xymond_channel  --channel=data --filter=e-series cat >
/var/tmp/xymon.out
there is no mention of the subversion or energise stuff either.

Does this narrow the field at all?
Based on Jeremy's earlier email, it looks like the issue is in xymond_rrd,
unless that GOCLIENT error can tell us something more?

Regards
Vernon



On 25 February 2015 at 18:06, Jeremy Laidman <jlaidman at rebel-it.com.au>
wrote:

> On 25 February 2015 at 19:16, Vernon Everett <everett.vernon at gmail.com>
> wrote:
>
>> These hosts all have nothing at all to do with the storage arrays being
>> monitored, which makes me think the client data might be a red herring.
>
>
> Yup, makes sense.
>
> My best guess is memory corruption within xymond.  So let's see if the
> corruption is visible in the messages being passed between xymond and
> xymond_channel.  If we see corrupt messages in there, we can start to delve
> into the source code to see if there's a bug somewhere.  Are you able to
> run your own instance of xymond_channel?  Maybe something like this:
>
> sudo -u xymon xymoncmd xymond_channel --channel=data --filter=zmem cat
>
> One you get an idea what it looks like, change "cat" for something like
> "egrep -A5 ^@" to get only the first 5 lines.  Also, redirect to a file
> until you notice a dodgy RRD file and then kill the process.
>
> Did you try running xymond with "--dbghost=HOSTNAME" ?  It might be too
> voluminous, but might be worth a try, if you can manage to snag the traffic
> at the right time.
>
> J
>
>


-- 
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150304/0421005f/attachment.html>


More information about the Xymon mailing list