[Xymon] RRD Graph stopped graphing -- BUG FOUND?
Michael Beatty
Michael.Beatty at sherwin.com
Thu Jan 3 18:03:49 CET 2013
I figured it out. And I believe I found a bug in the process.
Xymon uses a default step of 300 seconds with a heart beat of 600
seconds (10 mintues). If RRD doesn't get a status in under 10 minutes,
it'll take it as an unknown. In order to override the defaults, you
need set the values in xymonserver.cfg. This builds the the data set
part of the rrd. From the documentation:
NCV_slab="inodecache:GAUGE,dentrycache:GAUGE"
I figured out early on that you can add in the heartbeat. ie.
NCV_slab="inodecache:GAUGE:1800,dentrycache:GAUGE:1800"
This sets the heartbeat for those two data sets to be 1800 seconds (30
minutes). What I failed to figure out early on is that xymon doesn't
like this. For some reason (I would consider this a bug) if you supply
your own paramters to override the default, xymon still sticks its
default value into the rrdcreate string which ultimately left it to be:
NCV_slab="inodecache:GAUGE:1800:600"
So, when the create tool executed, it took the default heartbeat value
(600) and added it to the end of the create string. This set the
MINIMUM value, which you can clearly see in the XML I have posted
below. I failed to see this. This is where my problem lied. My values
were fractions of seconds... rrd needed a value greater than 600, so it
was throwing out my data.
In order to fix the issue I had to explicitly provide all of the
rrdcreate parameters:
NCV_slab"iinodecache:GAUGE:1800:0:U"
Also, another thing to note to make it work, you also need to put an
entry into rrddefinitions for these records because this is where the
step parameter is set:
[slab]
-s 900
RRA:AVERAGE:0.5:1:576
If this isn't a bug, it should at least be documented in the "Custom
Graphs" section.
Michael Beatty
On 12/19/2012 10:25 PM, Gore, David W (David) wrote:
> I haven't done this myself but my coworker has used it to fix graphing on one of my scripts that went from every 5 minutes to every 15 minutes:
>
> rrdtool tune your.rrd --heartbeat lambda:1200 # I don't know why he used 1200 since it seems like you would use 900
>
> Regardless try man rrdtool to see the details.
>
>
> ~David
>
> From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Michael Beatty
> Sent: Wednesday, December 19, 2012 13:38
> To: Larry Barber
> Cc: xymon at xymon.com
> Subject: Re: [Xymon] RRD Graph stopped graphing
>
> It seems that any frequency above 10 minutes doesn't work. From what I understand, this has to do with the step and heartbeat of the rrd. It looks like by default the step is 300 seconds and the heartbeat is 600 seconds (10 minutes). So, as long as it doesn't go above 10 minutes it will accept data, anything past 10 minutes will be tossed which would result in me not getting graphs.
>
> So, in reading the man pages, I figured out that I can override the 600 second default by appending it to my ENV tags in xymonserver.cfg
> NCV_MyAlert="MyAlert:GAUGE:900"
>
> At first glance, this worked perfectly. I deleted the rrd file, restarted xymon, did a rrd dump and the XML was now showing a heartbeat of 900 (15 minutes). So, I set my script to run every 15 minutes expecting a giant rainbow to arc over my screen, but, it still doesn't work.
>
> My XML looks like this:
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
> <!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
> <step> 300 </step> <!-- Seconds -->
> <lastupdate> 1355940819 </lastupdate> <!-- 2012-12-19 13:13:39 EST -->
>
> <ds>
> <name> Shortterm </name>
> <type> GAUGE </type>
> <minimal_heartbeat> 900 </minimal_heartbeat>
> <min> 6.0000000000e+02 </min>
> <max> NaN </max>
>
> <!-- PDP Status -->
> <last_ds> 1.080055 </last_ds>
> <value> NaN </value>
> <unknown_sec> 219 </unknown_sec>
> </ds>
>
> My rrd info:
>
>
>
> Michael Beatty
>
> On 12/18/2012 06:11 PM, Larry Barber wrote:
> man rrdtool. Or you can just redo what you did with to create the custom graphs, but just change the time interval.
>
> Thanks,
> Larry Barber
> On Tue, Dec 18, 2012 at 2:56 PM, Michael Beatty <Michael.Beatty at sherwin.com> wrote:
> How do I do that?
>
>
> When you change the frequency of data collection you need to update your rrd definitions. The rrd databases are fixed length files, so you are leaving lots of blank entries between data points.
>
> Thanks,
> Larry Barber
> On Tue, Dec 18, 2012 at 11:42 AM, Michael Beatty <Michael.Beatty at sherwin.com> wrote:
> I had two custom RRD graphs setup and running fine on client systems. The script that collects the data was running every minute. After I finished testing, I changed the frequency of the scripts on the clients to 1 hour in the clientlaunch.cfg file. When I did, the RRD graph stopped displaying data. It ran like this for a couple days. I changed clientlaunch.cfg back to 1 minute... and the graphs immediately started graphing again. Is there a minimum time frame that is needed, or something else that must be done? What am I missing?
>
More information about the Xymon
mailing list