Adding a custom RRD to graphs and monitoring

Matthew Moldvan mmoldvan at csc.com
Thu Aug 19 17:26:08 CEST 2010


All,

This is stressing me out, hopefully someone takes the time to go through 
my ramblings below and help me out.  Lots of information so please bear 
with me.

For the past few days I've been trying to add a custom script (iostat 
information) and have the data graphed, but I'm not having any luck 
(mostly due to not understanding the RRD definitions in hobbitgraph.cfg).

I've read through a ton of the how-tos on the subject, but all of them 
seem to vary a bit on the details.  My resulting graphs look like this: 
http://imgur.com/4Nwrp.jpg

So far I've got a script running on two systems reporting data back to the 
main page.  This brings up my first question:  When sending information to 
be graphed, is the data passed in as a bb status message or a bb data 
message? 

I thought I had it working at one point by sending similar data below 
through a status message, but I'd like to pass only a status message and 
HTML through the "bb status" command and keep the actual data passed for 
the RRD in the "bb data" command if that works.  I also tried wrapping the 
data below in HTML comments as below, but no luck.

"<!---
data like below (note newlines between HTML comment tags)
--->"

Sample data:
c1t50060E80104AAE50d1 : 0.82    
c1t50060E80104AAE50d2 : 0.07    
c1t50060E80104AAE50d3 : 1.71    
c1t50060E80104AAE50d4 : 0.46    
c3t50060E80104AAE52d0 : 1.31    
c3t50060E80104AAE52d1 : 1.53    
c3t50060E80104AAE52d2 : 0.09    
c3t50060E80104AAE52d3 : 3.14    
c3t50060E80104AAE52d4 : 0.61    
c3t50060E80104AAE52d12 : 0.06    
c0t0d0 : 11.70   
c1t50060E80104AAE50d0 : 0.87 

I've seen it both ways in the examples.  I tried sending both, but that 
doesn't seem to be working.  From what I understand if I specify a test as 
NCV in the TEST2RRD section, one of the running processes (hobbitd or 
hobbitrrd) will read in the "name : value" pair and pass that to an RRD 
update/create command?  Does that require integer values or are floating 
point up to a certain precision acceptable?  Currently I'm passing .2f 
from the nawk script and getting a bunch of "nans" in the RRD output 
(could be various reasons, though).

Here go the details (NOTE: All host names and IP addresses have been 
scrubbed to protect the innocent):

Script output:

+ /opt/xymon/client/bin/bb <xymon.server.ip> 'data <client.fqdn>.trends
c1t50060E80104AAE50d1 : 0.82    
c1t50060E80104AAE50d2 : 0.07    
c1t50060E80104AAE50d3 : 1.71    
c1t50060E80104AAE50d4 : 0.46    
c3t50060E80104AAE52d0 : 1.31    
c3t50060E80104AAE52d1 : 1.53    
c3t50060E80104AAE52d2 : 0.09    
c3t50060E80104AAE52d3 : 3.14    
c3t50060E80104AAE52d4 : 0.61    
c3t50060E80104AAE52d12 : 0.06    
c0t0d0 : 11.70   
c1t50060E80104AAE50d0 : 0.87    
'     
+ /opt/xymon/client/bin/bb <xymon.server.ip> 'status <client.fqdn>.iostat 
green Thu Aug 19 10:47:28 EDT 2010  
c1t50060E80104AAE50d1 : 0.82    
c1t50060E80104AAE50d2 : 0.07    
c1t50060E80104AAE50d3 : 1.71    
c1t50060E80104AAE50d4 : 0.46    
c3t50060E80104AAE52d0 : 1.31    
c3t50060E80104AAE52d1 : 1.53    
c3t50060E80104AAE52d2 : 0.09    
c3t50060E80104AAE52d3 : 3.14    
c3t50060E80104AAE52d4 : 0.61    
c3t50060E80104AAE52d12 : 0.06    
c0t0d0 : 11.70   
c1t50060E80104AAE50d0 : 0.87    
' 

Another question: I've seen some examples sending as "bb data 
<client.fqdn>.trends", is that correct, or if I'm using the "bb data" 
command do I have to specify the test name as above?

The RRD files are thus being created for every disk as such:

-rw-r--r--    1 xymon   495  19648 Aug 19 11:06 iostat,c0t0d0.rrd
-rw-r--r--    1 xymon   495  19648 Aug 18 23:22 iostat,c0t1d0.rrd
-rw-r--r--    1 xymon   495  19648 Aug 19 11:06 
iostat,c1t50060E80104AAE50d0.rrd
...snip...
-rw-r--r--    1 xymon   495  19648 Aug 18 23:22 
iostat,c3t50060E80104AAE52d8.rrd
-rw-r--r--    1 xymon   495  19648 Aug 18 23:22 
iostat,c3t50060E80104AAE52d9.rrd

An rrdtool dump <whatever>.rrd does confirm that some values are making it 
into the RRDs (assuming so by "last_ds" in dump output below):

[root@<hostname> <fqdn.rrd.dir>]# rrdtool dump 
iostat,c3t50060E80104AAE52d9.rrd | more
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
<!-- Round Robin Database Dump -->
<rrd>
        <version>0003</version>
        <step>300</step> <!-- Seconds -->
        <lastupdate>1282188121</lastupdate> <!-- 2010-08-18 23:22:01 EDT 
-->

        <ds>
                <name> lambda </name>
                <type> GAUGE </type>
                <minimal_heartbeat>0</minimal_heartbeat>
                <min>6.0000000000e+02</min>
                <max>NaN</max>

                <!-- PDP Status -->
                <last_ds>2.05</last_ds>
                <value>NaN</value>
                <unknown_sec> 121 </unknown_sec>
        </ds>

        <!-- Round Robin Archives -->
        <rra>
                <cf>AVERAGE</cf>
                <pdp_per_row>1</pdp_per_row> <!-- 300 seconds -->

                <params>
                <xff>5.0000000000e-01</xff>
                </params>
                <cdp_prep>
                        <ds>
                        <primary_value>NaN</primary_value>
 <secondary_value>0.0000000000e+00</secondary_value>
                        <value>NaN</value>
                        <unknown_datapoints>0</unknown_datapoints>
                        </ds>
                </cdp_prep>
                <database>
                        <!-- 2010-08-16 23:25:00 EDT / 1282015500 --> 
<row><v>NaN</v></row>
...snip, all others are NaN also...
                        <!-- 2010-08-18 23:20:00 EDT / 1282188000 --> 
<row><v>NaN</v></row>
                </database>
        </rra>
        <rra>
                <cf>AVERAGE</cf>
                <pdp_per_row>6</pdp_per_row> <!-- 1800 seconds -->

                <params>
                <xff>5.0000000000e-01</xff>
                </params>
                <cdp_prep>
                        <ds>
                        <primary_value>0.0000000000e+00</primary_value>
 <secondary_value>0.0000000000e+00</secondary_value>
                        <value>NaN</value>
                        <unknown_datapoints>4</unknown_datapoints>
                        </ds>
                </cdp_prep>
                <database>
                        <!-- 2010-08-06 23:30:00 EDT / 1281151800 --> 
<row><v>NaN</v></row>
...snip, all NaNs til the end...

Relevant lines from /etc/xymon/hobbitserver.cfg:

[root@<hostname> ~]#  egrep 'TEST2RRD|GRAPHS' /etc/xymon/hobbitserver.cfg 
# TEST2RRD defines the status- and data-messages you want to collect RRD 
data
TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns=tcp,dig=tcp,time=ntpstat,vmstat,vmio=ncv,
iostat=ncv
,netstat,temperature,apache,bind,sendmail,mailq,nmailq=mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,files,procs=processes,ports,clock,lines,ops,stats,cifs,JVM,JMS,HitCache,Session,JDBCConn,ExecQueue,JTA,TblSpace,RollBack,MemReq,InvObj,snapmirr,snaplist,snapshot"
GRAPHS="la,disk,inode,qtree,files,processes,memory,users,vmstat:vmstat0|vmstat1|vmstat2|vmstat3|vmstat4|vmstat5|vmstat6|vmstat7|vmstat8|vmstat9,
iostat
,vmio,tcp.http,tcp,netstat,ifstat,mrtg::1,ports,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,clock,lines,ops,stats,cifs,JVM,JMS,HitCache,Session,JDBCConn,ExecQueue,JTA,TblSpace,RollBack,MemReq,InvObj,snapmirr,snaplist,snapshot,devmon::1,if_load::1,temp,
ncv" 
 - (a tip from the web said "ncv" had to be in the GRAPHS portion and said 
"not sure why just trust me" ...)

Relevant lines from /etc/xymon/hobbitgraph.cfg:

[iostat]
        TITLE I/O Utilization - Overall
        FNPATTERN iostat(.*).rrd
        YAXIS Stats
        DEF:p at RRDIDX@=@RRDFN@:lambda:AVERAGE
        LINE1.5:p at RRDIDX@#@COLOR@:@RRDPARAM@
        GPRINT:p at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n

Anyone know of a link that explains some of the terminology above?  I 
checked the rrdcreate man page, but didn't see the parts about "@RRDIDX@" 
and @RRDFN@" and the other stuff.  p at RRDIDX@ seems to be in a lot of 
examples I've seen, and all my data is making it in with those variables 
(is that what they are?) without having multiple DEF statements.

The above is generating the image I included a link to above (
http://imgur.com/4Nwrp.jpg).

Thanks again to anyone that can help out ... I've been pulling my hair out 
about this for a few days.

Regards,
Matt.

Unix System Administrator
Computer Science Corporation

This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. 
NOTE: Regardless of content, this e-mail shall not operate to bind CSC to 
any order or other contract unless pursuant to explicit written agreement 
or government initiative expressly permitting the use of e-mail for such 
purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20100819/8e3580a2/attachment.html>


More information about the Xymon mailing list