[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] urgent rrd help needed - im desperate!
- To: hobbit (at) hswn.dk
- Subject: Re: [hobbit] urgent rrd help needed - im desperate!
- From: "Jeff Newman" <jeffnewman75 (at) gmail.com>
- Date: Wed, 22 Mar 2006 01:35:55 -0600
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=LiY1UKlV5v73cfWaQZPu6I6Y++FtiZGkoT+sukpBNVKjNOoBh+Cjrv+aJLLY9QbYdezPKlEqOTuLydyMAlsaElHoENBhmQRT06LrwsdYj6jcvFFbPOt9q1YgpXxfRR9JjGF9oRx2XK/EyrlxV9/urMKB1d7tTmCuBkHAOV3Wuxg=
- References: <6ED0E611018A7242BC600A0C9D3FAE9F062D7FD7@usplm234.amer.corp.eds.com>
I got things working, but now am stuck on a slightly different problem.
host A: has cpu0,1,2,3
host B: has cpu0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
Both have exactly the same RRD's, and both have the same service names
and everything. The problem is:
host A: Graph's each CPU just fine (I have meter::1 defined, so I get
1 graph per CPU)
host B: Graphs CPU 0,1 and then has broken boxes (i.e. no graph's) for
2,3 and doesn't even attempt to do anything with 4-15. All RRD's are
being updated correctly.
It's so odd because it's all setup EXACTLY the same. I don't see why
one would work and the other not. Im guessing that maybe there is
something with RRDIDX, but I don't know. Anyone have any thoughts?
On 3/20/06, Hubbard, Greg L <greg.hubbard (at) eds.com> wrote:
> I would study the code in hobbitgraph for graphing disk partition sizes,
> or for graphing usage on multi-CPU systems. This might help you with
> the "graph whatever you find" problem. I haven't worked with this
> myself, so I am no help...
>
> GLH
>
> -----Original Message-----
> From: Jeff Newman [mailto:jeffnewman75 (at) gmail.com]
> Sent: Monday, March 20, 2006 12:58 PM
> To: hobbit (at) hswn.dk
> Subject: Re: [hobbit] urgent rrd help needed - im desperate!
>
> Rather than focus on sar (Im going to have this problem with several
> tests I have) let me approach this question slightly different.
>
> I have host A and host B. They both have a program that looks at queues.
> there is a requirement on host A to only look at queue A, and on host B,
> they want to see queue's A & B.
>
> Host A sends a status message (with a test called qmeter) that has:
>
> aqueue:#
>
> Host B sends a status message (with a test called qmeter) that has:
>
> aqueue:#
> bqueue:#
>
> each host has an RRD that has it's own dataset (i.e. a DS for just
> aqueue or a DS for both a and bqueue). That all works great. The problem
> is the graphing. If the graph definition looks like this:
>
> [qmeter]
> TITLE 1 Second qmeter
> YAXIS Avg. Messages per second
> DEF:aqueue=qmeter.rrd:aqueue:AVERAGE
> DEF:bqueue=qmeter.rrd:bqueue:AVERAGE
> LINE1:aqueue#CC3333:a queue
> LINE1:bqueue#FF0000:b queue
> COMMENT:\n
> GPRINT:aqueue:LAST:a Queue \: %5.1lf%s (cur)
> GPRINT:aqueue:MAX: \: %5.1lf%s (max)
> GPRINT:aqueue:MIN: \: %5.1lf%s (min)
> GPRINT:aqueue:AVERAGE: \: %5.1lf%s (avg)\n
> GPRINT:bqueue:LAST:b Queue \: %5.1lf%s (cur)
> GPRINT:bqueue:MAX: \: %5.1lf%s (max)
> GPRINT:bqueue:MIN: \: %5.1lf%s (min)
> GPRINT:bqueue:AVERAGE: \: %5.1lf%s (avg)\n
>
> This obviously works great for host B, but on host A where only there A
> queue is defined in the RRD, this doesn't work, it won't even draw the
> graph.
>
> Using RRDIDX won't work because it relies on numbers. So I am stuck on
> how to make this graph properly, and using a server-side script is all I
> can think of. Would this be the correct approach, or is there a trick in
> hobbitgraph.cfg that I don't know about?
>
> -Jeff
>
>
> On 3/20/06, Hubbard, Greg L <greg.hubbard (at) eds.com> wrote:
> > Is there one sar graph per host, or multiple?
> >
> > I agree with your assessment -- creating explicit custom graphs is
> > easier than trying to make ncv_whatever work.
> >
> > Beware of the many moving parts:
> >
> > A) the "pitcher" -- a custom script run by the client to send over a
> > status page which the server will associate with a column. Do
> > yourself a favor and send the data in a format that is uniquely
> > recognized at the server end. You can see what is sent by looking at
> > the Web page associated with the column for a node. Henrik also
> > suggests defining $BB as "echo" during the early process so you can
> > see what is being sent by looking through the client log.
> >
> > B) the "catcher" -- a custom script run by the server to process the
> > data in a status page for a custom test. You are only allowed ONE (1)
>
> > catcher script per Hobbit server, so it must be equipped to handle all
>
> > custom tests. Fortunately, the test/column name is a parameter for
> > this script so you can use a switch statement to branch to the right
> > code for the incoming data. Mine is handling about 7 custom tests
> right now.
> >
> > C) the RRD format that you will use -- even Tobi's documentation is
> > hazy on whether it is better to use one file for several variables, or
>
> > one variable in each file, or what. Experimentation, trial, and error
>
> > cannot be avoided here unless you are already the RRD guru.
> >
> > D) the graph definitions in hobbitgraph.cfg -- more opportunity to
> > learn RRD!
> >
> > E) changing the right settings in hobbitserver.cfg file for TEST2RRD
> > and GRAPHS variables.
> >
> > F) Even so -- there are some limits. First, you can only have one
> > graph on the status page for each custom test. Other graphs can be
> > included on the trends page, but you will only get the one associated
> > with each custom test -- unless you set the TREND: flag in bb-hosts --
>
> > which I haven't fully explored.
> >
> > Hope all this helps. Even though the Hobbit documents provide a lot
> > of pointers, there are still many places where the innocent can go
> > wrong -- but debugging it will teach you a lot about the Hobbit
> > innards -- which helps you appreciate the hard work Henrik and others
> > have put into this tool...!
> >
> > GLH
> >
> >
> >
> > -----Original Message-----
> > From: Jeff Newman [mailto:jeffnewman75 (at) gmail.com]
> > Sent: Monday, March 20, 2006 11:45 AM
> > To: hobbit (at) hswn.dk
> > Subject: Re: [hobbit] urgent rrd help needed - im desperate!
> >
> > Ya, I tried for about 4-6 hours to get it to work, even tried
> > following the steps that someone previously sent in to the hobbit
> > list. I could get it to send data, but I couldn't get the graphing
> portion working.
> >
> > As for the problem I posted, I think I figured out what I need to do.
> > I need to have a parsing script on the server end to parse the data
> > into seperate RRD's. That way, I can send from the client side a test
> > with the name "sar" (then I will only have 1 sar column) and let the
> > server side create the sar#.rrd. Hopefully I don't have problems on
> > the hobbitgraph.cfg end.
> >
> > -Jeff
> >
> >
> > On 3/17/06, Galen Johnson <gjohnson (at) trantor.org> wrote:
> > > Jeff Newman wrote:
> > >
> > > >Ok, i've been working on this for 6+ hours, and am totally stuck.
> > > >Here is the script on the client:
> > > >
> > > >
> > > >====================================
> > > >
> > > >#!/bin/sh
> > > >
> > > >BB=/usr/local/hobbit/client/bin/bb
> > > >BBDISP=xxx.xxx.xxx.xxx
> > > >MACHINE=xxxxxx
> > > >
> > > >sar -P ALL 1 1 | grep -E "^[0-9]|^( *)" | grep -v \- | grep -v cpu
>
> > > >|
> >
> > > >cut -c 9-> /tmp/hobbit_sar.$$ 2>&1; mv /tmp/hobbit_sar.$$
> > > >/tmp/hobbit_sar.tmp </dev/null > /dev/null
> > > >
> > > >
> > > >while read aline; do
> > > >CPUNUM=`echo $aline | awk '{print $1}'` PUSR=`echo $aline | awk
> > > >'{print $2}'` PSYS=`echo $aline | awk '{print $3}'` PWIO=`echo
> > > >$aline
> >
> > > >| awk '{print $4}'` PIDL=`echo $aline | awk '{print $5}'`
> > > >
> > > >echo "cpu"$CPUNUM"pcntusr : $PUSR" >> /tmp/hobbit_sar"$CPUNUM".msg
> > > >echo "cpu"$CPUNUM"pcntsys : $PSYS" >> /tmp/hobbit_sar"$CPUNUM".msg
> > > >echo "cpu"$CPUNUM"pcntwio : $PWIO" >> /tmp/hobbit_sar"$CPUNUM".msg
> > > >echo "cpu"$CPUNUM"pcntidl : $PIDL" >> /tmp/hobbit_sar"$CPUNUM".msg
> > > >
> > > >$BB $BBDISP "status $MACHINE.sar,"$CPUNUM" green `date` `cat
> > > >/tmp/hobbit_sar"$CPUNUM".msg` "
> > > >rm /tmp/hobbit_sar"$CPUNUM".msg
> > > >done < /tmp/hobbit_sar.tmp
> > > >rm /tmp/hobbit_sar.tmp
> > > >
> > > >========================================
> > > >
> > > >It sends the data just fine. (output from sh -x)
> > > >+ BB=/usr/local/hobbit/client/bin/bb
> > > >+ BBDISP=167.76.113.220
> > > >+ MACHINE=stlfan3
> > > >+ sar -P ALL 1 1
> > > >+ grep -E ^[0-9]|^( *)
> > > >+ grep -v -
> > > >+ grep -v cpu
> > > >+ cut -c 9-
> > > >+ 1> /tmp/hobbit_sar.20750 2>& 1
> > > >+ mv /tmp/hobbit_sar.20750 /tmp/hobbit_sar.tmp 0< /dev/null 1>
> > > >+ /dev/null 0< /tmp/hobbit_sar.tmp read aline
> > > >+ + awk {print $1}
> > > >+ echo 0 2 8 1 89
> > > >CPUNUM=0
> > > >+ + awk {print $2}
> > > >+ echo 0 2 8 1 89
> > > >PUSR=2
> > > >+ + awk {print $3}
> > > >+ echo 0 2 8 1 89
> > > >PSYS=8
> > > >+ + awk {print $4}
> > > >+ echo 0 2 8 1 89
> > > >PWIO=1
> > > >+ + awk {print $5}
> > > >+ echo 0 2 8 1 89
> > > >PIDL=89
> > > >+ echo cpu0pcntusr : 2
> > > >+ 1>> /tmp/hobbit_sar0.msg
> > > >+ echo cpu0pcntsys : 8
> > > >+ 1>> /tmp/hobbit_sar0.msg
> > > >+ echo cpu0pcntwio : 1
> > > >+ 1>> /tmp/hobbit_sar0.msg
> > > >+ echo cpu0pcntidl : 89
> > > >+ 1>> /tmp/hobbit_sar0.msg
> > > >+ date
> > > >+ cat /tmp/hobbit_sar0.msg
> > > >
> > > >+ /usr/local/hobbit/client/bin/bb --debug 167.76.113.220 status
> > > >+ stlfan3.sar,0 gr
> > > >een Fri Mar 17 11:58:59 EST 2006
> > > >
> > > >cpu0pcntusr : 2
> > > >cpu0pcntsys : 8
> > > >cpu0pcntwio : 1
> > > >cpu0pcntidl : 89
> > > >
> > > >2006-03-17 11:58:59 Transport setup is:
> > > >2006-03-17 11:58:59 bbdportnumber = 1984
> > > >2006-03-17 11:58:59 bbdispproxyhost = NONE
> > > >2006-03-17 11:58:59 bbdispproxyport = 0
> > > >2006-03-17 11:58:59 Recipient listed as 'xxx.xx.xxx.xxx'
> > > >2006-03-17 11:58:59 Standard BB protocol on port 1984
> > > >2006-03-17 11:58:59 Will connect to address xxx.xx.xxx.xxx port
> > > >1984
> > > >2006-03-17 11:58:59 Connect status is 0
> > > >2006-03-17 11:58:59 Sent 121 bytes
> > > >2006-03-17 11:58:59 Closing connection
> > > >+ rm /tmp/hobbit_sar0.msg
> > > >+ read aline
> > > >
> > > ><and so on, incrementing cpu numbers as expected.>
> > > >
> > > >On the hobbit server, I want this to work like "disk" where there
> > > >are
> >
> > > >multiple file systems under one disk column. I manually created the
>
> > > >RRD's (again for a custom time step)
> > > >
> > > >-rw-r--r-- 1 hobbit hobbit 22121176 Mar 17 10:31 sar,0.rrd
> > > >-rw-r--r-- 1 hobbit hobbit 22121176 Mar 17 10:31 sar,1.rrd
> > > >-rw-r--r-- 1 hobbit hobbit 22121176 Mar 17 10:31 sar,2.rrd
> > > >-rw-r--r-- 1 hobbit hobbit 22121176 Mar 17 10:31 sar,3.rrd
> > > >
> > > >The DS names in the rrd dump look fine:
> > > > <ds>
> > > > <name> cpu0pcntusr </name>
> > > > <type> GAUGE </type> for example.
> > > >
> > > >Note this all doesn't work if the files are just "sar0.rrd, sar1rrd
>
> > > >etc..." without the ,'s
> > > >
> > > >Unfortunately, on the web page, it gives me 3 columns, a sar,0
> > > >sar,1
> > > >sar,2 Which is nitpicky, but if I can just have a "sar" column with
>
> > > >the others under it would be great (like the disk problem). Here is
>
> > > >the REAL problem.
> > > >
> > > >Looking at the sar,0 button for example, I see the data update
> > > >there,
> >
> > > >HOWEVER, NONE of the /usr/local/hobbit/data/rrd/xxxx/sar,#.rrd
> > > >files ever get updated!!!
> > > >In addition, there isn't even a link for a graph in the page.
> > > >
> > > >No errors in /var/log/hobbit/rrd*, or any others that I have looked
>
> > > >at on the client or server. Here are the .cfg files. I have tried
> > > >many variations, these are just how they are now:
> > > >
> > > >hobbitserver.cfg - here are the lines that I have "sar" in:
> > > >
> > > >
> > >
> > >TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns
> > >=t
> > cp,dig=tcp,time=ntpstat,vmstat,iostat,netstat,temperature,apache,bind,
> > se
> > ndmail,mailq,nmailq=mailq,sar,socks,bea,iishealth,citrix,bbgen,bbtest,
> > bb proxy,hobbitd,HiFlowNet="ncv",sock="ncv",qmeter="ncv",rtt="ncv"
> > > >
> > > >Note, I have tried sar="ncv", sar,sar0,sar1, etc.. maybe I havn't
> > > >tried the right variation :-(
> > > >
> > >
> > >GRAPHS="la,disk,inode,qtree,memory,users,vmstat,iostat,tcp.http,tcp,n
> > >et
> > stat,mrtg::1,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,
> > ii
> > shealth,citrix,bbgen,bbtest,bbproxy,hobbitd,ncv,HiFlowNet,sock,rtt,sar
> > ,s
> > ar0,sar1,sar2,sar3"
> > > >
> > > >again, tried many variations.. and also again, maybe I havn't tried
>
> > > >the right one.
> > > >
> > > >bb-hosts:
> > > >only tried putting "xxx.xxx.xxx.xxx xxx # conn sar
> > > >that didn't help
> > > >
> > > >
> > > >hobbitgraph.cfg:
> > > >
> > > >[sar]
> > > > FNPATTERN sar(.*).rrd
> > > > TITLE CPU sar
> > > > YAXIS %
> > > > DEF:p (at) RRDIDX@= (at) RRDFN@:cpu (at) RRDIDX@pcntusr:AVERAGE
> > > > DEF:p (at) RRDIDX@= (at) RRDFN@:cpu (at) RRDIDX@pcntsys:AVERAGE
> > > > DEF:p (at) RRDIDX@= (at) RRDFN@:cpu (at) RRDIDX@pcntwio:AVERAGE
> > > > DEF:p (at) RRDIDX@= (at) RRDFN@:cpu (at) RRDIDX@pcntidl:AVERAGE
> > > > LINE2:p (at) RRDIDX@# (at) COLOR@:@RRDPARAM@
> > > > -u 100
> > > > -l 0
> > > > GPRINT:p (at) RRDIDX@:LAST: \: %5.1lf (cur)
> > > > GPRINT:p (at) RRDIDX@:MAX: \: %5.1lf (max)
> > > > GPRINT:p (at) RRDIDX@:MIN: \: %5.1lf (min)
> > > > GPRINT:p (at) RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
> > > >
> > > >If anyone has anything else they need to see, let me know.
> > > >I need to get this working quickly, and am at the end of my rope!
> > > >I've done other custom graph's with custom RRD's, and never had
> > > >this problem before.
> > > >
> > > >By the way, the host is sending other custom data just fine with no
> > problems.
> > > >
> > > >Thanks for any help!
> > > >
> > > >-Jeff
> > > >
> > > >To unsubscribe from the hobbit list, send an e-mail to
> > > >hobbit-unsubscribe (at) hswn.dk
> > > >
> > > >
> > > >
> > > >
> > > Have you looked at the sar script on deadcat?...it's really very
> > > nice...it has some minor issues but works great. It might need
> > > tweaking for hobbit but I don't think it will.
> > >
> > > =G=
> > >
> > > To unsubscribe from the hobbit list, send an e-mail to
> > > hobbit-unsubscribe (at) hswn.dk
> > >
> > >
> > >
> >
> > To unsubscribe from the hobbit list, send an e-mail to
> > hobbit-unsubscribe (at) hswn.dk
> >
> >
> >
> > To unsubscribe from the hobbit list, send an e-mail to
> > hobbit-unsubscribe (at) hswn.dk
> >
> >
> >
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe (at) hswn.dk
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe (at) hswn.dk
>
>
>