[Xymon] Rare shape in graphs.

Jeremy Laidman jlaidman at rebel-it.com.au
Mon Apr 29 06:55:57 CEST 2013


On 23 April 2013 21:33, Raul GN <ragonlan at gmail.com> wrote:

> Hi all,
>    Recently I've included 2 new host to monitor in xymon but from time to
> time response times are quite high (2 o 3 seconds). Shape in graphs draws
> ups and downs in a very recognisable manner:
>

I believe this is a result of quantization and interpolation when RRD
stores the results, and it shows up when RRD creates the graphs.  It also
seems more pronounced when the metrics are whole-numbers, such as clock
offset.  It's also more obvious for really small numbers.  Here's a graph
of mine that shows an interesting repeating diamond shape.

[image: Inline images 1]
(But your multi-coloured "Connection Times" graph is far prettier.)

This took me a while to understand what's going on, and I still don't quite
get it.  But by way of an example, let's say that the clock offset (error)
of a host is 2.333 seconds relative to the Xymon server.  But let's say the
quantization causes RRD to accept only whole numbers (I'm stretching the
truth here for demonstration purposes).  The numbers will come in as 2.333,
2.333, 2.333, but RRD will store them as 2, 2, 3, 2, 2, 3.  So you end up
with a repeating pattern (a spike every 3rd sample) that really doesn't
exist, and is related to the way RRD stores, consolidates and retrieves
numbers.  This pattern is easily detected by our brain vision circuitry.

Of course, RRD can indeed store floating point numbers, but when you have
really small variations in numbers (many decimal places), RRD has to round
the numbers at some point, and this gives the same effect.  Also, the
number of decimal places required to avoid such effects is increased when
you do averaging and interpolation.

Interpolation, the way I see it, is as follows.  When RRD gets a value of 7
at 3:05:00 and a value of 8 at 3:10:01, it doesn't store "8" for the second
value, because it didn't come in exactly 5 minutes after the first value.
 Instead, RRD removes one second's worth from the value, because it has to
store the second value at 3:10:00.  If it took 301 seconds to go from 7 to
8, then RRD will calculate that the value after 300 seconds would have been
around 7.996... So if your sampling jumps around a little earlier and
later, in a regular manner, you may see an interpolation artefact that
matches that variation.

Typically, xymonnet performs its checks in the same order every time, and
earlier checks might delay later checks very slightly, just enough to push
the sample timestamp into the next whole second, causing an interpolation
artefact.  Because the ordering is not randomised, the artefacts are
consistent and recognisable as patterns in the graphs.  It could be that,
by adding "--shuffle" to the call to xymonnet, the artefacts will no longer
form a recognisable pattern.

While I don't fully understand what's going on, I think it's safe to say
that it's not a problem with either Xymon or the servers being tested, but
is a result of how RRD does its thing.

J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20130429/f41dbb08/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 32976 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20130429/f41dbb08/attachment.png>


More information about the Xymon mailing list