[hobbit] trying to get netapp filer data into larrd graphs
Henrik Storner
henrik at hswn.dk
Wed Feb 16 22:56:36 CET 2005
In <4213B272.7040306 at nandomedia.com> Tom Georgoulias <tgeorgoulias at nandomedia.com> writes:
>I'm using the filerstats2bb script from deadcat.net to get from my
>Netapp filers and displaying it in hobbit. This is what is displayed on
>the status page:
>conn, cpu, disk, info, inode, qtree, trends, user_quota
>The data displayed is accurate, but the only graph that works is conn.
>The rest are severely broken.
That figures, since the "conn" test is run by Hobbit (bbtest-net) and
reports data in a form that Hobbit knows how to handle.
>I would like to fix this, starting with CPU. I'm hoping that what I
>learn here can be used when I attempt to create custom graphs for
>user_quota & qtree with the custom RRD feature described in
>hobbitd_larrd.
You can use some of it, but there is a difference between fixing an
existing handler (hobbit already handles some "cpu" data), and adding
a new handler that hobbit does not know about. Simply because when
fixing the cpu-handler, you really have to fix the current C code.
>The rest of this message concerns only the load
>average/CPU graph problem, since I figure this ought to work without any
>modification.
>For example, this is the contents of a status summary displayed on the
>CPU status page:
>==
> Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
>filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1
The best way of working with the RRD data that Hobbit handles is to
snoop on the data that is sent from hobbitd to the hobbitd_larrd
program. You can do that by listening on the hobbit "status" channel:
~/server/bin/bbcmd sh
hobbitd_channel --channel=status cat
When the "cpu" status arrives, you'll see something like this:
@@status#121308|1108589727.548324|172.16.10.2||voodoo.hswn.dk|cpu|1108591527|green||green|1106668421|0||0|
status voodoo,hswn,dk.cpu green Wed Feb 16 22:35:27 CET 2005 up: 23 days, 2 users, 171 procs, load=11
top - 22:35:27 up 23 days, 48 min, 2 users, load average: 0.24, 0.11, 0.09
Tasks: 170 total, 1 running, 169 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.2% us, 1.5% sy, 0.1% ni, 91.2% id, 2.8% wa, 0.1% hi, 0.1% si
Mem: 646876k total, 635204k used, 11672k free, 194116k buffers
Swap: 787176k total, 23608k used, 763568k free, 123284k cached
[lots of lines from "top" snipped]
@@
The first line with "@@status..." is the beginning of a message - it
has some information that hobbitd picks out from all messages, like
the hostname, test-name, color etc. The important thing here is to see
that hobbitd does see that it is a "cpu" status - there's "|cpu|" in
the first line. That means hobbitd_larrd will send this message
through the "cpu" handler in hobbitd/larrd/do_la.c.
So we need to look at what the do_la.c file does.
eoln = strchr(msg, '\n'); if (eoln) *eoln = '\0';
This finds the first new-line character, and cuts off anything after
that. So essentially, it only looks at the first line of the status
message.
p = strstr(msg, "up: ");
if (p) {
.... process the message ....
This searches the message (or rather, the first line of it), for the
string "up: " . I suspect this is where it breaks for your Netapp
reports, because they have "Uptime:", not "up: "
> Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
>filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1
Yes, computers are picky about such details ...
So the first fix is to change those lines above to handle a report
with the keyword "Uptime:" - e.g. like this:
p = strstr(msg, "up: ");
if (!p) p = strstr(msg, "Uptime:");
if (p) {
Just one line added. But in this case, I think it makes all the
difference - because the rest of the reports looks like it will be
handled just fine by the current code in do_la.c
I've added this fix to my sources.
Not much info here about doing custom graphs, I'm afraid. But if you
look over the example in the hobbitd_larrd man-page, it should get you
started. If not, feel free to ask for more help.
Henrik
PS: If you want me to look at that Netapp disk-report that isn't being
graphed, just send me an example of what such a report looks like.
H.
More information about the Xymon
mailing list