[hobbit] TCP/IP stats (bits/s) limited to 100M

Sun Jul 9 22:02:32 CEST 2006

On Jul 9, 2006, at 12:18 PM, Henrik Stoerner wrote:
>
> OK, you got me on that one.

Not really, you inherited this ;)   He is trying to get me, and his  
point is valid, but the tool 'works as designed', read on . . .

>
> It seems that using COUNTER for the byte-counts in both the
> netstat- and ifstat-RRD's might be a good idea.

*might* being the operative word there

> The question then
> becomes "what's a suitable max" for these data ? Should I
> assume they are 32-bit counters ? I know some of them are not
> (e.g. Solaris has 64-bit counters for bytes in/out per interface).

exactly, and it is even more complicated than that . . . see below

>
> I'll change it to a counter now, with MAX set to "unknown". The  
> overflow
> handling should still work correctly, if I understand the RRD
> docs right.

I would not recommend this.  Another major issue is counter resets  
instead of overflows (e.g reboot) get mistaken as wraps if the MAX is  
not correct.  From what I recall, if you use counter and anything  
gets mistaken, you get a massive spike in the RRD making all the data  
relatively useless because the y axis autoscales to the spike.

With DERIVE=0 you acknowledge you won't handle counter wraps  
correctly (which are not that common anyway) but the result for all  
wraps/resets are benign with the NaN, which does *not* cause a  
spike.  I am a firm believer in no data is better than bad data.

I am not opposing the ideal that COUNTER with correct max is the  
'right way'.   The problem with software that runs on so many  
platforms is the correct max is impossible to know for certain.   
Defining the MAX as just whatever 32/64 bits value is not adequate  
because reboots will cause spikes, you'd need to now the MAX for the  
particular metric and that is completely impossible to know  
absolutely.  inbytes MAX would need to be different for 10Mb/s 100  
1000, Token Ring 16Mb/s, etc, etc.

DERIVE=0 and NaN is a much better compromise than the spikes.  And I  
would bet the farm reboots are a much more common event than counter  
wraps for the majority of environments.

And Henrik, the net result to you will be answering an endless stream  
of emails regarding why every COUNTER RRD has spikes . . . I've been  
there, done that ;)  I am almost 100% positive there is not *one*  
counter RRD in the larrd stuff, all DERIVE.  It's not impossible  
rrdtool has changed to alleviate some of this, but from what I have  
read of your email streams it I haven't seen anything to support that.

scott