[Xymon] ifstat numbers tracking 10Gb network performance incorrectly

Steve Groom sgroom at ipac.caltech.edu
Thu Apr 14 19:00:06 CEST 2016


Hi,

We have been using xymon as a great tool for tracking various performance
metrics across our environment, and being able to drill into historical
performance data as a way to understand overall system performance.
One of the things we've had difficulty with is the performance stats
returned by the "ifstat" module, and specifically on 10Gbit ethernet
interfaces. On xymon clients with 10G interfaces, sometimes the recorded
metrics get clobbered and recorded incorrectly. At times when we know
the server network activity is relatively high (e.g. 3+ Gbits/sec)
the xymon ifstat charts might show only 110Mbit/sec. And further, while
we know the activity is not constant, the charts are often very "flat"
and appear capped at that level. It looks like an artificial cap, like
the stats are being clipped at that level.
It is as if xymon is mishandling the activity metrics that the
client is returning. Or maybe the RRD modules that xymon uses?

We are using xymon's default DERIVE types for the ifstat metrics,
bytesSent and bytesReceived. I have traced the obytes64 and rbytes64
values returned by the client and they look sane, and match
what other network tools on the client are telling us. But those
values are are getting mangled somewhere between there and the
graphs. Dumping the rrd files I can see the "PDP" (latest update)
values there are correct (e.g. ~2e+09), but the "CDP" (5-min average) values are
not (e.g. ~1.4e+07). How can that be? Why would RRD do that?

Mysteriously, every so often the graphs seem to spring to life, and show
reasonable values up above 1Gbit. And sometimes it happens around
the time when we're tinkering with one network element or another.
But after a while (~minutes to hours), they just as mysteriously
revert back to the bogus "capped" values. (the RRD's CDP values reflect this
as well.) Most aggravating, as it leaves us unable to believe these charts.
So the rrd is _sometimes_ showing good data, but often not.

We are working through the process of updating everything (xymon, rrdtool)
to latest versions, but wondering if this might be a problem
in our configuration that software updates alone won't fix

Is there some well-worn advice out there about how to configure
xymon to properly gather/store/chart network performance stats
for 10Gbit networks, specifically the "ifstat" module?

Thanks in advance for any tips, pointers, ideas...

Steve Groom
sgroom at ipac.caltech.edu


More information about the Xymon mailing list