[hobbit] strange graph behavior - random machines & graphs
Gary Baluha
gumby3203 at gmail.com
Fri Nov 30 19:27:03 CET 2007
On Nov 30, 2007 12:18 PM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
> On Nov 30, 2007 10:55 AM, Gary Baluha <gumby3203 at gmail.com> wrote:
>
> > Hmm, this is getting curiouser and curiouser. Apparently at least
> > _some_ of the graphs that appear corrupted still have some valid data. If I
> > use the graph zoom feature (clicking on the magnifying glass) and select
> > certain portions of the graph, the graph data shows up as normal. It
> > appears that the problem is related to periodic data artifacts (the huge
> > numbers) that cause the scale of the graph to resize to show it within
> > bounds, and this causes the valid data to essentially disappear.
> >
> > I realized this when I looked at the graph, and saw that the (curr) and
> > (min) data points were showing normal values. It's just the (max) and (avg)
> > values that are way off, which causes the rest of the graph to be incorrect.
> >
> >
>
>
> Have you tried running hobbitd_rrd with the "--debug" option?? Add it to
> the various hobbitd_rrd entries in server/etc/hobbitlaunch.cfg. I haven't
> tried it myself, so I don't know how verbose it gets. I seem to recall
> Henrik saying it's OK to just kill hobbitd_rrd processes because they get
> respawned.
>
> I guess the debug output shows up in the rrd-status.log in your Hobbit
> logs directory. Is there anything interesting in that log already?? Or any
> other log??
There wasn't anything useful in any of the logs, besides the usual stuff. I
turned on the --debug option, and here is a sample of the data for one of
the affected machines:
2007-11-30 13:14:07 hobbitd_rrd: Got message 562165
@@status#562165|1196446447.724393|192.168.232.110||danno|disk|1196448247|yellow||yellow|1196053505|0||0||1196446447
2007-11-30 13:14:07 startpos 343968, fillpos 343968, endpos -1
2007-11-30 13:14:07 RRD update param 00: 'rrdupdate'
2007-11-30 13:14:07 RRD update param 01:
'/var/hobbit/data/rrd/danno/disk,dev,odm.rrd'
2007-11-30 13:14:07 RRD update param 02: '-t'
2007-11-30 13:14:07 RRD update param 03: 'pct:used'
2007-11-30 13:14:07 RRD update param 04: '1196446447:0:0'
I'm afraid I don't know how to interpret all of this, unfortunately. I get
that the "param 03" means the graph is showing "percentage [disk space]
used", and that "param 01" means it is updating that specific rrd file. And
I remember that "-t" in "param 02" is some rrdtool flag. But I don't know
what the numbers in "param 04" mean. I assume the first number is the #
seconds since 1970, and the second number is the current value, but I don't
know what the last number means. Also, I'm not sure how to interpret all of
the data in the "@@status" line.
By the way, this excerpt is from a machine that is having the graph display
problems. In this case, the data it is receiving is normal and correct.
I'm waiting for another update when the data is incorrect.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071130/99371993/attachment.html>
More information about the Xymon
mailing list