[Xymon] are bar well scaled through categories.

Tue Sep 22 15:32:11 CEST 2015

Really appreciate your answers Jeremy and Shawn.

I will try to adjust the rrd def to better fit our use of that graphics.

Thanks a lot!
On Sep 21, 2015 9:09 PM, "Jeremy Laidman" <jlaidman at rebel-it.com.au> wrote:

> On 22 September 2015 at 06:04, Randall Badilla <rbadillarx at gmail.com>
> wrote:
>
>> c) does the problem resides on how rrdtool plots or internal manipulation
>> of the Solaris scripts?
>>
>
> Yes, this.
>
> RRD, by definition, is a round-robin database that "consolidates" in a
> lossy way.  Combining this with the widespread use of counter stats
> produces the effect you see.  Let me try to explain by a simple example.
> But first, a caveat that I'm not an expert in RRD, and my understanding is
> partly from my own guess at what's happening based on how I would implement
> things.
>
> Let's say that at 10:05am, the interface bit counter is 10,500,000 bits
> (meaning that the interface has transmitted 10.5 million bits since reboot,
> a "clear counters" command, or a counter roll-over).  The router is polled
> every 5 minutes for its interface statistics.  At 10:10am, 5 minutes later,
> the counter has incremented to 12,900,000 bits.  The difference between the
> two samples is 2,400,000 bits.  So rrdgraph (and hence Xymon) will show a
> 5-minute average value of 2,400,000/300=8kbps.
>
> Now, RRD doesn't store 8kbps.  Nor does it store 12,900,000.  Instead, it
> stores:
>  10:05=10,500,000
>  10:10=12,900,000
>  10:15=...
>  (etc)
> In other words, only the absolute counter values get stored (along with
> the timestamps for each).  These are the "primary data points".
>
> To store 5-minute counter values for years and years would require a huge
> database file that would take lots of CPU power to calculate and produce ad
> hoc long-term views of the data.  Generally we only care about fine-grained
> (primary) data point samples when they're recent, and as the data points
> get older, we care more about hourly, daily or weekly trends instead.  RRD
> solves this problem by reducing resolution for older samples.
>
> Back to our example.  After 1 day, RRD "consolidates" the 5-minute values
> into longer intervals so that they don't take up as much space.  The
> consolidation parameters are configurable, but for our example let's say it
> keeps 5-minute samples around for up to 24 hours, and after that it turns
> them into hourly samples.  How does it do this?  Well all it needs to do is
> forget 11 out of 12 samples in an hour.  So now RRD is storing:
>   10:05=10,500,000
>   11:05=27,320,000
>   12:05=34,150,000
>   13:05=...
>
> Note that it's still storing new 5-minute primary data points verbatim.
> The above list is only showing data points that are 24-hours old, from the
> time we started our sampling.
>
> The same consolidation process occurs when the hourly samples get older
> than (say) 12 days, and they might be turned into daily samples by
> forgetting all but one sample per day.
>
> Again, let me stress that the timeframes used above are tuneable per RRD
> file.  In fact, I arbitrarily chose 5-minutes, hourly, daily and 12-day
> time periods, for illustration purposes only, and typical deployments are
> usually not exactly as I have described.  But the principle still applies.
> You can view the parameters of an RRD file with "rrdtool info
> <filename.rrd>".
>
> Now, back to the phenomenon you're seeing, which is an apparent reduction
> in the magnitude of samples.  The reason this happens is that the RRD
> database is always making averages when it queries an RRD *COUNTER* value.
>  (Other sampling methods are available, such as GAUGE and DERIVE, but most
> routers provide interface statistics as counters.)  Even when you ask RRD
> to graph the most recent, 5-minute samples, you should realise that those
> samples are averaged over 5 minutes.  There was almost certainly a
> fluctuation during the 5-minute interval that went higher than the
> calculated value, but the best RRD can do is show the average over that
> time, by subtracting the two counter values and dividing by the time
> period, to get average bps.
>
> When RRD generates 12-day graphs, it uses (in our example) hourly samples
> because for most of the 12 days, the 5-minute samples are now gone.  So to
> produce the numbers for the time from 10:05 to 11:05, it can't show the
> peaks and troughs that used to show in the 5-minute samples, and instead
> can only show the average for the hour, because now all that it has are the
> two counter values for time periods 1 hour apart.  This averaging gets
> worse as the granularity reduces.
>
> For Xymon, this is pretty much it.  However, in some cases, it's actually
> a little bit more complicated than this.  RRD has specific "consolidation
> functions" that it uses when moving data from each sample rate to the next
> (eg from 5-minute to hourly samples).  For example, a typical RRD file can
> store consolidated samples for MIN, MAX, AVERAGE and LAST, although RRD
> files created by Xymon only have AVERAGE (defined in rrddefinitions.cfg).
> I think for GAUGE sample types, RRD has to calculate the consolidated
> average of 5-minute primary data points when it consolidates them to hourly
> samples, rather than just forgetting the intermediate samples, because
> GAUGE is different.  Similarly, even for AVERAGE, if the RRD is configured
> to use a MAX consolidation function, it calculates the hourly maximum as
> the maximum value of the 5-minute samples.  When Xymon shows "max" and
> "min" values, but the data set only has AVERAGE samples, it has to
> calculate the max and min, as simply the highest/lowest average over the
> time period.  If the RRD file was created to use MAX and/or MIN
> consolidation functions, then my understanding is that the longer-term
> values for MAX and MIN will be the actual max and min of the 5-minute
> samples.
>
> To solve your problem, you can simply explain that longer-term views are
> averaged from short-term views.  But if you want more accurate maxima and
> minima on your longer-term views, then I think you can adjust
> rrddefinitions.cfg to include MAX and MIN consolidation functions, but this
> will only apply to newly created RRD files.  Alternatively, you can use
> rrdtune to add new CFs to an existing file, but note that it will really
> only help with new samples.  I've never done this, so I don't know how well
> it works, or how to do it.  However, there used to be a TRACKMAX option for
> this purpose, and this post describes how the same effect can be achieved
> with an update to rrddefinitions.cfg:
> http://lists.xymon.com/archive/2010-November/029960.html.
>
> Hope that helps.
>
> Cheers
> Jeremy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150922/7de82548/attachment.html>