[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] TCP/IP stats (bits/s) limited to 100M




On Jul 9, 2006, at 12:18 PM, Henrik Stoerner wrote:

OK, you got me on that one.

Not really, you inherited this ;) He is trying to get me, and his point is valid, but the tool 'works as designed', read on . . .



It seems that using COUNTER for the byte-counts in both the netstat- and ifstat-RRD's might be a good idea.

*might* being the operative word there

The question then
becomes "what's a suitable max" for these data ? Should I
assume they are 32-bit counters ? I know some of them are not
(e.g. Solaris has 64-bit counters for bytes in/out per interface).

exactly, and it is even more complicated than that . . . see below


I'll change it to a counter now, with MAX set to "unknown". The overflow
handling should still work correctly, if I understand the RRD
docs right.

I would not recommend this. Another major issue is counter resets instead of overflows (e.g reboot) get mistaken as wraps if the MAX is not correct. From what I recall, if you use counter and anything gets mistaken, you get a massive spike in the RRD making all the data relatively useless because the y axis autoscales to the spike.


With DERIVE=0 you acknowledge you won't handle counter wraps correctly (which are not that common anyway) but the result for all wraps/resets are benign with the NaN, which does *not* cause a spike. I am a firm believer in no data is better than bad data.

I am not opposing the ideal that COUNTER with correct max is the 'right way'. The problem with software that runs on so many platforms is the correct max is impossible to know for certain. Defining the MAX as just whatever 32/64 bits value is not adequate because reboots will cause spikes, you'd need to now the MAX for the particular metric and that is completely impossible to know absolutely. inbytes MAX would need to be different for 10Mb/s 100 1000, Token Ring 16Mb/s, etc, etc.

DERIVE=0 and NaN is a much better compromise than the spikes. And I would bet the farm reboots are a much more common event than counter wraps for the majority of environments.

And Henrik, the net result to you will be answering an endless stream of emails regarding why every COUNTER RRD has spikes . . . I've been there, done that ;) I am almost 100% positive there is not *one* counter RRD in the larrd stuff, all DERIVE. It's not impossible rrdtool has changed to alleviate some of this, but from what I have read of your email streams it I haven't seen anything to support that.


scott