[hobbit] Graphs are missing data, but it's there!
Ward, Martin
Martin.Ward at colt.net
Mon Jul 28 14:34:39 CEST 2008
In this instance no negative numbers are reported to Hobbit. The
negatives you see in the example below are displayed in the web page.
The data is collated using Hobbit's built-in NCV script, so I am not
using a manual script to sort the data out.
TEST2RRD is configured thus:
TEST2RRD="cpu=la,disk,inode,lines,postfixqueue=ncv,postfixdeliveries=ncv
"
and NCV_postfixqueue is configured thus:
NCV_postfixqueue="ActiveQueue:GAUGE,BounceQueue:GAUGE,DeferQueue:GAUGE,C
orruptQueue:GAUGE,IncomingQueue:GAUGE"
So the client script is expected to return five colon-separated values.
As you can see from the example below this data is returned quite
correctly:
ActiveQueue: 5494
BounceQueue: 219
DeferQueue: 145971
CorruptQueue: 0
IncomingQueue: 409494
yet this data is not being stored properly.
Is there any way of getting more diagnostic data out of the generic NCV
code or the hobbitd_rrd module itself?
|\/|artin
-----Original Message-----
From: Hubbard, Greg L [mailto:greg.hubbard at eds.com]
Sent: 25 July 2008 15:28
To: hobbit at hswn.dk
Subject: RE: [hobbit] Graphs are missing data, but it's there!
I have not been able to get the Hobbit graph thing to work with
negative numbers. If you are using the "manual" script method for
parsing the return, you should be able to save the output from the
failing server in a file, then run the processing script by hand to see
what it spits out (should be commands for the Hobbit RRD support to
obey). I have spent many an hour debugging my own stuff this way...
GLH
________________________________
From: Ward, Martin [mailto:Martin.Ward at colt.net]
Sent: Friday, July 25, 2008 8:55 AM
To: hobbit at hswn.dk
Subject: RE: [hobbit] Graphs are missing data, but it's
there!
OK, with everyone's help I have made progress. After
trying all the different suggestions it came down to: Why can't I get an
"rrdtool dump" output? The reason was that sometime in the past someone
(probably me) managed to replace the rrdtool binary with an empty file
(stop sniggering at the back please).
Having done this before I know how it happens... you
type your command line:
/opt/rrdtool/bin/rrdtool dump postfixqueue.rrd
but when using bash command line editing you manage to
put a > at the start, making the command line:
>/opt/rrdtool/bin/rrdtool dump postfixqueue.rrd
The file still keeps its execute permission, but
executing an empty file returns nothing...
So, having got a real, working copy of the rrdtool
program and running it on the dodgy data file I can see that data is
indeed being stored there, and a vast number of lines look like this:
<!-- 2008-07-25 12:45:00 UTC / 1216989900 --> <row><v>
NaN </v><v> NaN </v><v> NaN </v><v> NaN </v><v> NaN </v></row>
a copy from one of the working ones shows:
<!-- 2008-07-25 13:00:00 UTC / 1216990800 --> <row><v>
5.5427200000e+03 </v><v> 2.1861333333e+02 </v><v> 1.4601324333e+05
</v><v> 0.0000000000e+00 </v><v> 4.0939317667e+05 </v></row>
So it seems to be a problem with translating the output
from the client program into data that RRD can understand.
Now, here is the contents of the hostlogs file of the
working server, this should tie up with the data entry above:
----
red Friday July 25 12:59:31 UTC 2008
<br><br>
<pre>
ActiveStatus: &red
ActiveQueue: 5494
ActiveTrend: tendency <b>rising</b> with <b>-81</b>
mails.
BounceStatus: &green
BounceQueue: 219
BounceTrend: tendency <b>rising</b> <b>-2</b> mails.
DeferStatus: &green
DeferQueue: 145971
DeferTrend: amount equal to last measure.
CorruptStatus: &green
CorruptQueue: 0
CorruptTrend: amount equal to last measure.
IncomingStatus: &red
IncomingQueue: 409494
IncomingTrend: tendency <b>falling</b> with <b>858</b>
mails.
</pre>
Status unchanged in 0.00 minutes
Message received from 10.44.107.107
Client data ID 1216990657
----
and here are the contents of the non-working one:
----
red Friday July 25 12:45:04 UTC 2008
<br><br>
<pre>
ActiveStatus: &green
ActiveQueue: 39
ActiveTrend: tendency <b>falling</b> with <b>973</b>
mails.
BounceStatus: &green
BounceQueue: 58
BounceTrend: amount equal to last measure.
DeferStatus: &red
DeferQueue: 154348
DeferTrend: tendency <b>falling</b> with <b>865</b>
mails.
CorruptStatus: &green
CorruptQueue: 0
CorruptTrend: amount equal to last measure.
IncomingStatus: &red
IncomingQueue: 206927
IncomingTrend: tendency <b>rising</b> with
<b>-206926</b> mails.
Deferred Queue is too high but is decreasing
already.<br>
</pre>
Status unchanged in 0.00 minutes
Message received from 10.44.107.105
Client data ID 1216989837
----
As mentioned previously all these servers use the same
scripts to send the data to the server and the same scripts to process
it once it arrives, indeed as you can see above the two different
entries look identical in format. I checked the scripts on the remote
servers to see if there were any differences between them and found a
few minor differences but nothing huge. Still, just to be sure I copied
the postfixqueue.sh script from a working server to the broken one and
waited for it to run. Alas, although the script transmits sensible data
back to the Hobbit server:
----
ActiveStatus: &green
ActiveQueue: 448
ActiveTrend: tendency falling with 9 mails.
BounceStatus: &green
BounceQueue: 59
BounceTrend: tendency rising -1 mails.
DeferStatus: &green
DeferQueue: 149697
DeferTrend: amount equal to last measure.
CorruptStatus: &green
CorruptQueue: 0
CorruptTrend: amount equal to last measure.
IncomingStatus: &red
IncomingQueue: 213848
IncomingTrend: amount equal to last measure.
----
The rrd file STILL contains:
<!-- 2008-07-25 13:45:00 UTC / 1216993500 --> <row><v>
NaN </v><v> NaN </v><v> NaN </v><v> NaN </v><v> NaN </v></row>
Any RRD experts got any ideas?
|\/|artin
-----Original Message-----
From: Phil Wild [mailto:philwild at gmail.com]
Sent: 24 July 2008 17:42
To: hobbit at hswn.dk
Subject: Re: [hobbit] Graphs are missing data,
but it's there!
The rrd version should be okay, after all it is
graphing data from other hosts with no problem.
It would appear that you ncv and graph
configurations are correct as you say they are working for other hosts.
This would indicate it is a problem with this host's configuration, so
where to look...
Just out of interest, can you take an rrd file
this test from a host that works, and copy it into the
.../data/rrd/hostname directory of the host that does not?
I would expect after doing this that you will
have a graph for this host. Can you confirm this works? After doing this
and leaving it for 10 minutes, do you see any new data in the graph?
Can you dump the data from this rrd file?
2008/7/25 Ward, Martin <Martin.Ward at colt.net>:
> Are you saying that you run the same
tests on multiple hosts and only one host in not showing data?
Yes.
> Does this mean they all share the same
NCV configuration in hobberserver.cfg and the same graph definition in
hobbitgraph.cfg?
Yes.
> What if you remove the rrd file and
let hobbit create a new one, does that help?
I did this and as you'd expect initially
the web page showed no graph although it did show data (stored from the
previous run I presume).
After an interval the file appeared
again but running "rrdtool dump" on it STILL failed to produce any data.
I'm starting to wonder about the
versions of RRD, but they ought to be data-compatible; I'm using rrdtool
v1.2.15.
The histlogs show no errors, the
hist/mc25,... data file contains valid data. I DO get a few RRD errors
like this:
rrd-status.log:2008-07-21 09:46:19 RRD
error updating
/opt/hobbit/data/rrd/mc25.lon.server.colt.net/tcp.smtp.rrd from
10.44.107.48: illegal attempt to update using time 1216633579 when last
update time is 1216633579 (minimum one second step)
which make it look like Hobbit is
actually updating the RRD file... I just can't get any data out!
|\/|
-----Original Message-----
From: Phil Wild
[mailto:philwild at gmail.com]
Sent: 24 July 2008 16:31
To: hobbit at hswn.dk
Subject: Re: [hobbit] Graphs are missing
data, but it's there!
Are you saying that you run the same
tests on multiple hosts and only one host in not showing data? Does this
mean they all share the same NCV configuration in hobberserver.cfg and
the same graph definition in hobbitgraph.cfg?
If this is correct, then it really
points to something not getting into the rrd file. As previously
suggested, rrd dump is your best bet at finding the problem here. What
if you remove the rrd file and let hobbit create a new one, does that
help?
Cheers
Phil
2008/7/24 Hubbard, Greg L
<greg.hubbard at eds.com>:
You know the data exists because you
used the rrd dump tool to display it?
Is the graph simply not shown at all, or
is there a "hole" in the Web page where it normally would go? ("show
page source" might have a clue).
Some ideas/shots in the dark:
a) check the logs
b) meticulously compare a "working"
system to the non-working system, and make sure that they really are
identical.
c) look at the trends page for this host
to see if the graph is okay there...
Etc. I am sure you know the drill -- a
big pain to look under every rock, but it has to be done...
GLH
________________________________
From: Ward, Martin
[mailto:Martin.Ward at colt.net]
Sent: Thursday, July 24, 2008 8:21 AM
To: hobbit at hswn.dk
Subject: RE: [hobbit] Graphs are missing
data, but it's there!
Thanks for the suggestion but that
didn't work (I guess you meant rrd).
Any other ideas?
|\/|
-----Original Message-----
From: Roberts, James
[mailto:James.Roberts at hants.gov.uk]
Sent: 24 July 2008 12:47
To: hobbit at hswn.dk
Subject: RE: [hobbit] Graphs are missing
data, but it's there!
you need to touch all the rdd.
________________________________
From: Ward, Martin
[mailto:Martin.Ward at colt.net]
Sent: 24 July 2008 12:43
To: hobbit at hswn.dk
Subject: [hobbit] Graphs are missing
data, but it's there!
All,
I have a problem with one machine where
its data is not being shown in the graphs even though the data exists.
The machine in question's Hobbit client
sends five pieces of numeric data (email queues) and these are displayed
on the web page for this service:
====
Thursday July 24 11:29:11 UTC 2008
ActiveStatus: green
<http://hbt0.lon.oss.colt.net/hobbit/gifs/green.gif>
ActiveQueue: 106
ActiveTrend: tendency rising with -60
mails.
BounceStatus: green
<http://hbt0.lon.oss.colt.net/hobbit/gifs/green.gif>
BounceQueue: 58
BounceTrend: tendency falling with 3
mails.
DeferStatus: red
<http://hbt0.lon.oss.colt.net/hobbit/gifs/red.gif>
DeferQueue: 150464
DeferTrend: tendency falling with 95
mails.
CorruptStatus: green
<http://hbt0.lon.oss.colt.net/hobbit/gifs/green.gif>
CorruptQueue: 0
CorruptTrend: amount equal to last
measure.
IncomingStatus: red
<http://hbt0.lon.oss.colt.net/hobbit/gifs/red.gif>
IncomingQueue: 247049
IncomingTrend: amount equal to last
measure.
Deferred Queue is too high but is
decreasing already.
====
These numbers change over time and the
values are accurate.
However, the graph that is displayed
below this data is blank. I have historic data, the files exist, and
what is more I have other machines that are configured identically to
this one where the data IS graphed correctly.
Hobbit graphs are a bit of a black hole
to me, can anyone suggest where I might look?
|\/|artin
************************************************************************
*************
The message is intended for the named
addressee only and may not be disclosed to or used by anyone else, nor
may it be copied in any way.
The contents of this message and its
attachments are confidential and may also be subject to legal privilege.
If you are not the named addressee and/or have received this message in
error, please advise us by e-mailing security at colt.net and delete the
message and any attachments without retaining any copies.
Internet communications are not secure
and COLT does not accept responsibility for this message, its contents
nor responsibility for any viruses.
No contracts can be created or varied on
behalf of COLT Telecommunications, its subsidiaries or affiliates
("COLT") and any other party by email Communications unless expressly
agreed in writing with such other party.
Please note that incoming emails will be
automatically scanned to eliminate potential viruses and unsolicited
promotional emails. For more information refer to www.colt.net or
contact us on +44(0)20 7390 3900.
--
Tel: 0400 466 952
Fax: 0433 123 226
email: philwild AT gmail.com
To unsubscribe from the hobbit list,
send an e-mail to
hobbit-unsubscribe at hswn.dk
--
Tel: 0400 466 952
Fax: 0433 123 226
email: philwild AT gmail.com
*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way.
The contents of this message and its attachments are confidential and may also be subject to legal privilege. If you are not the named addressee and/or have received this message in error, please advise us by e-mailing security at colt.net and delete the message and any attachments without retaining any copies.
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses.
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20080728/1d90934d/attachment.html>
More information about the Xymon
mailing list