[hobbit] strange graph behavior - random machines & graphs

Gary Baluha gumby3203 at gmail.com
Fri Nov 30 19:27:03 CET 2007


On Nov 30, 2007 12:18 PM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:

> On Nov 30, 2007 10:55 AM, Gary Baluha <gumby3203 at gmail.com> wrote:
>
> > Hmm, this is getting curiouser and curiouser.  Apparently at least
> > _some_ of the graphs that appear corrupted still have some valid data.  If I
> > use the graph zoom feature (clicking on the magnifying glass) and select
> > certain portions of the graph, the graph data shows up as normal.  It
> > appears that the problem is related to periodic data artifacts (the huge
> > numbers) that cause the scale of the graph to resize to show it within
> > bounds, and this causes the valid data to essentially disappear.
> >
> > I realized this when I looked at the graph, and saw that the (curr) and
> > (min) data points were showing normal values.  It's just the (max) and (avg)
> > values that are way off, which causes the rest of the graph to be incorrect.
> >
> >
>
>
> Have you tried running hobbitd_rrd with the "--debug" option??  Add it to
> the various hobbitd_rrd entries in server/etc/hobbitlaunch.cfg.  I haven't
> tried it myself, so I don't know how verbose it gets.  I seem to recall
> Henrik saying it's OK to just kill hobbitd_rrd processes because they get
> respawned.
>
> I guess the debug output shows up in the rrd-status.log in your Hobbit
> logs directory.  Is there anything interesting in that log already??  Or any
> other log??


There wasn't anything useful in any of the logs, besides the usual stuff.  I
turned on the --debug option, and here is a sample of the data for one of
the affected machines:

 2007-11-30 13:14:07 hobbitd_rrd: Got message 562165
@@status#562165|1196446447.724393|192.168.232.110||danno|disk|1196448247|yellow||yellow|1196053505|0||0||1196446447
2007-11-30 13:14:07 startpos 343968, fillpos 343968, endpos -1
2007-11-30 13:14:07 RRD update param 00: 'rrdupdate'
2007-11-30 13:14:07 RRD update param 01:
'/var/hobbit/data/rrd/danno/disk,dev,odm.rrd'
2007-11-30 13:14:07 RRD update param 02: '-t'
2007-11-30 13:14:07 RRD update param 03: 'pct:used'
2007-11-30 13:14:07 RRD update param 04: '1196446447:0:0'

I'm afraid I don't know how to interpret all of this, unfortunately.  I get
that the "param 03" means the graph is showing "percentage [disk space]
used", and that "param 01" means it is updating that specific rrd file.  And
I remember that "-t" in "param 02" is some rrdtool flag.  But I don't know
what the numbers in "param 04" mean.  I assume the first number is the #
seconds since 1970, and the second number is the current value, but I don't
know what the last number means.  Also, I'm not sure how to interpret all of
the data in the "@@status" line.

By the way, this excerpt is from a machine that is having the graph display
problems.  In this case, the data it is receiving is normal and correct.
I'm waiting for another update when the data is incorrect.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071130/99371993/attachment.html>


More information about the Xymon mailing list