[hobbit] strange graph behavior - random machines & graphs

Fri Nov 30 19:51:01 CET 2007

On Nov 30, 2007 1:45 PM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:

> On Nov 30, 2007 12:27 PM, Gary Baluha <gumby3203 at gmail.com> wrote:
>
> >
> > There wasn't anything useful in any of the logs, besides the usual
> > stuff.  I turned on the --debug option, and here is a sample of the data for
> > one of the affected machines:
> >
> >  2007-11-30 13:14:07 hobbitd_rrd: Got message 562165
> > @@status#562165|1196446447.724393|192.168.232.110||danno|disk|1196448247|yellow||yellow|1196053505|0||0||1196446447
> > 2007-11-30 13:14:07 startpos 343968, fillpos 343968, endpos -1
> > 2007-11-30 13:14:07 RRD update param 00: 'rrdupdate'
> > 2007-11-30 13:14:07 RRD update param 01:
> > '/var/hobbit/data/rrd/danno/disk,dev,odm.rrd'
> > 2007-11-30 13:14:07 RRD update param 02: '-t'
> > 2007-11-30 13:14:07 RRD update param 03: 'pct:used'
> > 2007-11-30 13:14:07 RRD update param 04: '1196446447:0:0'
> >
> > I'm afraid I don't know how to interpret all of this, unfortunately.  I
> > get that the "param 03" means the graph is showing "percentage [disk space]
> > used", and that "param 01" means it is updating that specific rrd file.  And
> > I remember that "-t" in "param 02" is some rrdtool flag.  But I don't know
> > what the numbers in "param 04" mean.  I assume the first number is the #
> > seconds since 1970, and the second number is the current value, but I don't
> > know what the last number means.  Also, I'm not sure how to interpret all of
> > the data in the "@@status" line.
> >
> > By the way, this excerpt is from a machine that is having the graph
> > display problems.  In this case, the data it is receiving is normal and
> > correct.  I'm waiting for another update when the data is incorrect.
> >
>
> The "-t" option specifies the template to use, which is in param03 -
> "pct:used".  Param 04 is the actual data to insert, starting with the
> date/time in seconds (i.e. 1196446447), then zero for the "pct" value,
> then zero for the "used" value.
>
> This stuff may not help much, but maybe it will show where the data goes
> weird - i.e. is hobbitd_rrd being handed bad data, or does it get
> corrupted later on.
>

That's what I'm hoping.  One other thing I noticed is that for the hosts
that have bad graphs, but where some graphs are still okay, the good graphs
have a gap of data precisely when the bad graphs have another data spike.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071130/05eeaafd/attachment.html>