[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] trying to get netapp filer data into larrd graphs



In <4213B272.7040306 (at) nandomedia.com> Tom Georgoulias <tgeorgoulias (at) nandomedia.com> writes:

>I'm using the filerstats2bb script from deadcat.net to get from my
>Netapp filers and displaying it in hobbit.  This is what is displayed on
>the status page:

>conn, cpu, disk, info, inode, qtree, trends, user_quota

>The data displayed is accurate, but the only graph that works is conn.
>The rest are severely broken.

That figures, since the "conn" test is run by Hobbit (bbtest-net) and
reports data in a form that Hobbit knows how to handle.

>I would like to fix this, starting with CPU.  I'm hoping that what I
>learn here can be used when I attempt to create custom graphs for
>user_quota & qtree with the custom RRD feature described in 
>hobbitd_larrd.  

You can use some of it, but there is a difference between fixing an
existing handler (hobbit already handles some "cpu" data), and adding
a new handler that hobbit does not know about. Simply because when
fixing the cpu-handler, you really have to fix the current C code.


>The rest of this message concerns only the load 
>average/CPU graph problem, since I figure this ought to work without any 
>modification.


>For example, this is the contents of a status summary displayed on the
>CPU status page:
>==
>  Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
>filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1

The best way of working with the RRD data that Hobbit handles is to
snoop on the data that is sent from hobbitd to the hobbitd_larrd
program. You can do that by listening on the hobbit "status" channel:

    ~/server/bin/bbcmd sh
    hobbitd_channel --channel=status cat

When the "cpu" status arrives, you'll see something like this:

@@status#121308|1108589727.548324|172.16.10.2||voodoo.hswn.dk|cpu|1108591527|green||green|1106668421|0||0|
status voodoo,hswn,dk.cpu green Wed Feb 16 22:35:27 CET 2005 up: 23 days, 2 users, 171 procs, load=11

top - 22:35:27 up 23 days, 48 min,  2 users,  load average: 0.24, 0.11, 0.09
Tasks: 170 total,   1 running, 169 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.2% us,  1.5% sy,  0.1% ni, 91.2% id,  2.8% wa,  0.1% hi,  0.1% si
Mem:    646876k total,   635204k used,    11672k free,   194116k buffers
Swap:   787176k total,    23608k used,   763568k free,   123284k cached

[lots of lines from "top" snipped]

@@

The first line with "@@status..." is the beginning of a message - it
has some information that hobbitd picks out from all messages, like
the hostname, test-name, color etc. The important thing here is to see
that hobbitd does see that it is a "cpu" status - there's "|cpu|" in
the first line. That means hobbitd_larrd will send this message
through the "cpu" handler in hobbitd/larrd/do_la.c.

So we need to look at what the do_la.c file does.

        eoln = strchr(msg, '\n'); if (eoln) *eoln = '\0';

This finds the first new-line character, and cuts off anything after
that. So essentially, it only looks at the first line of the status
message.


        p = strstr(msg, "up: ");
        if (p) {
              .... process the message ....

This searches the message (or rather, the first line of it), for the
string "up: " . I suspect this is where it breaks for your Netapp
reports, because they have "Uptime:", not "up: "

>  Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
>filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1

Yes, computers are picky about such details ...

So the first fix is to change those lines above to handle a report
with the keyword "Uptime:" - e.g. like this:

        p = strstr(msg, "up: ");
	if (!p) p = strstr(msg, "Uptime:");
        if (p) {


Just one line added. But in this case, I think it makes all the
difference - because the rest of the reports looks like it will be
handled just fine by the current code in do_la.c

I've added this fix to my sources.


Not much info here about doing custom graphs, I'm afraid. But if you
look over the example in the hobbitd_larrd man-page, it should get you
started. If not, feel free to ask for more help.

Henrik


PS: If you want me to look at that Netapp disk-report that isn't being
graphed, just send me an example of what such a report looks like.

H.