[Xymon] are bar well scaled through categories.
Randall Badilla
rbadillarx at gmail.com
Tue Sep 22 15:32:11 CEST 2015
Really appreciate your answers Jeremy and Shawn.
I will try to adjust the rrd def to better fit our use of that graphics.
Thanks a lot!
On Sep 21, 2015 9:09 PM, "Jeremy Laidman" <jlaidman at rebel-it.com.au> wrote:
> On 22 September 2015 at 06:04, Randall Badilla <rbadillarx at gmail.com>
> wrote:
>
>> c) does the problem resides on how rrdtool plots or internal manipulation
>> of the Solaris scripts?
>>
>
> Yes, this.
>
> RRD, by definition, is a round-robin database that "consolidates" in a
> lossy way. Combining this with the widespread use of counter stats
> produces the effect you see. Let me try to explain by a simple example.
> But first, a caveat that I'm not an expert in RRD, and my understanding is
> partly from my own guess at what's happening based on how I would implement
> things.
>
> Let's say that at 10:05am, the interface bit counter is 10,500,000 bits
> (meaning that the interface has transmitted 10.5 million bits since reboot,
> a "clear counters" command, or a counter roll-over). The router is polled
> every 5 minutes for its interface statistics. At 10:10am, 5 minutes later,
> the counter has incremented to 12,900,000 bits. The difference between the
> two samples is 2,400,000 bits. So rrdgraph (and hence Xymon) will show a
> 5-minute average value of 2,400,000/300=8kbps.
>
> Now, RRD doesn't store 8kbps. Nor does it store 12,900,000. Instead, it
> stores:
> 10:05=10,500,000
> 10:10=12,900,000
> 10:15=...
> (etc)
> In other words, only the absolute counter values get stored (along with
> the timestamps for each). These are the "primary data points".
>
> To store 5-minute counter values for years and years would require a huge
> database file that would take lots of CPU power to calculate and produce ad
> hoc long-term views of the data. Generally we only care about fine-grained
> (primary) data point samples when they're recent, and as the data points
> get older, we care more about hourly, daily or weekly trends instead. RRD
> solves this problem by reducing resolution for older samples.
>
> Back to our example. After 1 day, RRD "consolidates" the 5-minute values
> into longer intervals so that they don't take up as much space. The
> consolidation parameters are configurable, but for our example let's say it
> keeps 5-minute samples around for up to 24 hours, and after that it turns
> them into hourly samples. How does it do this? Well all it needs to do is
> forget 11 out of 12 samples in an hour. So now RRD is storing:
> 10:05=10,500,000
> 11:05=27,320,000
> 12:05=34,150,000
> 13:05=...
>
> Note that it's still storing new 5-minute primary data points verbatim.
> The above list is only showing data points that are 24-hours old, from the
> time we started our sampling.
>
> The same consolidation process occurs when the hourly samples get older
> than (say) 12 days, and they might be turned into daily samples by
> forgetting all but one sample per day.
>
> Again, let me stress that the timeframes used above are tuneable per RRD
> file. In fact, I arbitrarily chose 5-minutes, hourly, daily and 12-day
> time periods, for illustration purposes only, and typical deployments are
> usually not exactly as I have described. But the principle still applies.
> You can view the parameters of an RRD file with "rrdtool info
> <filename.rrd>".
>
> Now, back to the phenomenon you're seeing, which is an apparent reduction
> in the magnitude of samples. The reason this happens is that the RRD
> database is always making averages when it queries an RRD *COUNTER* value.
> (Other sampling methods are available, such as GAUGE and DERIVE, but most
> routers provide interface statistics as counters.) Even when you ask RRD
> to graph the most recent, 5-minute samples, you should realise that those
> samples are averaged over 5 minutes. There was almost certainly a
> fluctuation during the 5-minute interval that went higher than the
> calculated value, but the best RRD can do is show the average over that
> time, by subtracting the two counter values and dividing by the time
> period, to get average bps.
>
> When RRD generates 12-day graphs, it uses (in our example) hourly samples
> because for most of the 12 days, the 5-minute samples are now gone. So to
> produce the numbers for the time from 10:05 to 11:05, it can't show the
> peaks and troughs that used to show in the 5-minute samples, and instead
> can only show the average for the hour, because now all that it has are the
> two counter values for time periods 1 hour apart. This averaging gets
> worse as the granularity reduces.
>
> For Xymon, this is pretty much it. However, in some cases, it's actually
> a little bit more complicated than this. RRD has specific "consolidation
> functions" that it uses when moving data from each sample rate to the next
> (eg from 5-minute to hourly samples). For example, a typical RRD file can
> store consolidated samples for MIN, MAX, AVERAGE and LAST, although RRD
> files created by Xymon only have AVERAGE (defined in rrddefinitions.cfg).
> I think for GAUGE sample types, RRD has to calculate the consolidated
> average of 5-minute primary data points when it consolidates them to hourly
> samples, rather than just forgetting the intermediate samples, because
> GAUGE is different. Similarly, even for AVERAGE, if the RRD is configured
> to use a MAX consolidation function, it calculates the hourly maximum as
> the maximum value of the 5-minute samples. When Xymon shows "max" and
> "min" values, but the data set only has AVERAGE samples, it has to
> calculate the max and min, as simply the highest/lowest average over the
> time period. If the RRD file was created to use MAX and/or MIN
> consolidation functions, then my understanding is that the longer-term
> values for MAX and MIN will be the actual max and min of the 5-minute
> samples.
>
> To solve your problem, you can simply explain that longer-term views are
> averaged from short-term views. But if you want more accurate maxima and
> minima on your longer-term views, then I think you can adjust
> rrddefinitions.cfg to include MAX and MIN consolidation functions, but this
> will only apply to newly created RRD files. Alternatively, you can use
> rrdtune to add new CFs to an existing file, but note that it will really
> only help with new samples. I've never done this, so I don't know how well
> it works, or how to do it. However, there used to be a TRACKMAX option for
> this purpose, and this post describes how the same effect can be achieved
> with an update to rrddefinitions.cfg:
> http://lists.xymon.com/archive/2010-November/029960.html.
>
> Hope that helps.
>
> Cheers
> Jeremy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150922/7de82548/attachment.html>
More information about the Xymon
mailing list