diskstat test bug.

Vernon Everett everett.vernon at gmail.com
Tue Sep 21 06:03:43 CEST 2010


Hi all

For all of you that are using the diskstat.ksh I posted on Xymonton 2 weeks
ago, heads-up!

Following in the grand tradition that every useful program will contain at
least one variable, one loop and one bug, my script must be useful.
It contained a rather severe bug.
Some of the more observant may have noticed that the graphs for "Percent of
Time Waiting" exceeded 100 from time to time. Odd, no?

OK, here's what happened.
Check the for loop that kicks off around line 32.
There is one less element than columns in the output of iostat -xrn
The missing column is wsvc_t (average service time in wait queue, in
milliseconds)

Oddly enough, this column is omitted in the output of iostat -xr and only
makes its appearance when -n is used.
Which is the cause of my mistake. The original first draft of the script
used -xr, and then changed to use -n to get more useful device names. (and
then had to do some creative awking to get the device names back to the
first column)
Compare iostat -x | head to iostat -xn | head
Methinks this is a Solaris bug. I can't think of any valid reason for the
column mismatch.

So, if you are using the diskstat test, all values for the last 3 graphs
will be wrong.
They are "Average Response Time of Transaction", "Percent of Time Waiting"
and "Percent of Time Disk Busy"

How to fix this.
First of all, change line 32 of diskstat.ksh from
for subtest in reads writes kreads kwrites wait actv svct pw pb
to
for subtest in reads writes kreads kwrites wait actv *wsvc *svct pw pb

Update hobbitgraph.cfg with the new graph definition.
[diskstat-wsvc]
    FNPATTERN diskstat-wsvc,(.*).rrd
    TITLE Average Number of Transactions Waiting
    YAXIS Total
    -l 0
    DEF:p at RRDIDX@=@RRDFN@:lambda:AVERAGE
    LINE2:p at RRDIDX@#@COLOR@:@RRDPARAM@
    GPRINT:p at RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:p at RRDIDX@:MAX: \: %5.1lf (max)
    GPRINT:p at RRDIDX@:MIN: \: %5.1lf (min)
    GPRINT:p at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n

Add the new test to hobbitserver.cfg
TEST2RRD= gets the extra diskstat-wsvc=ncv
GRAPHS= gets diskstat-wsvc::7
And we need to add another line for SPLIT_NCV
SPLITNCV_diskstat-wsvc="*:GAUGE"

I have updated Xymonton already, so you could just grab the new version
there.

For those concerned with hanging onto their old data, you could do the
following.
Rename all diskstat-svct*.rrd to diskstat-wsvc*.rrd
Rename all diskstat-pw*.rrd to diskstat-svct*.rrd
Rename all diskstat-pb*.rrd to diskstat-pw*.rrd
The exact command is left as an exercise for the reader.
Of course, all the data that should be in diskstat-pb*.rrd, is lost. Gone
forever. (Never retained, to be more correct)

Or you could just trash diskstat-svct*.rrd, diskstat-pw*.rrd and
diskstat-pb*.rrd and start over. It's only been 12 days, at most.

Apologies for the inconvenience.

Regards
     Vernon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20100921/02180690/attachment.html>


More information about the Xymon mailing list