[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] strange graph behavior - random machines & graphs
- To: hobbit (at) hswn.dk
- Subject: Re: [hobbit] strange graph behavior - random machines & graphs
- From: "Gary Baluha" <gumby3203 (at) gmail.com>
- Date: Fri, 30 Nov 2007 13:27:03 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=KNro5vQp4bRtvYQAPLYtRfTvmrRC6WLghBfJ7fOjRFs=; b=qE0z09P9CbpYQTnNCZy3O45JftdeoP+PbR14RzWlhoKjh+HsowUNhHXk6l/h0Q+FgfoOCpAZz+zvCO3Ku0Y8wOU2lJJNQTAeiNptCLPkSSKpDrgKtArrokPpAUvfxypukBdgF9K6JU+Lxn+kMxYXhr4g1eLiB0vIZdAz+t/fNjw=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=hc9yjZ+jnGVtb1RB4297L/XDm9BQtVcPKQNSoiWS9IYr+ttIB4etbr+xO21oLQylmVlfAdD0op/21hmcGEQGTtlIeCRfVtxx17oySsvWR56brOBniWTU9yE+ttvu5Q5DVRFp7OZza3YW4WOBFpS5N0jtNhfFW1/zBVeHVNk643E=
- References: <29f517690711280656x5c9fa38cta0e80f0d5761c1f7 (at) mail.gmail.com> <29f517690711280708k54276fd1me58d46d1e70ea600 (at) mail.gmail.com> <29f517690711300725g127fd5f7v148aa688764a4f94 (at) mail.gmail.com> <58EF0861D3A1A04182720B3A5231C7C201E022DB (at) usplm205.amer.corp.eds.com> <29f517690711300814q7a1ef9bew600c2d02d2d4de29 (at) mail.gmail.com> <29f517690711300855n7633ae02m7b93bf482f634992 (at) mail.gmail.com> <997a524e0711300918v204e8c30x4e74b37075d61a74 (at) mail.gmail.com>
On Nov 30, 2007 12:18 PM, Ralph Mitchell <ralphmitchell (at) gmail.com> wrote:
> On Nov 30, 2007 10:55 AM, Gary Baluha <gumby3203 (at) gmail.com> wrote:
>
> > Hmm, this is getting curiouser and curiouser. Apparently at least
> > _some_ of the graphs that appear corrupted still have some valid data. If I
> > use the graph zoom feature (clicking on the magnifying glass) and select
> > certain portions of the graph, the graph data shows up as normal. It
> > appears that the problem is related to periodic data artifacts (the huge
> > numbers) that cause the scale of the graph to resize to show it within
> > bounds, and this causes the valid data to essentially disappear.
> >
> > I realized this when I looked at the graph, and saw that the (curr) and
> > (min) data points were showing normal values. It's just the (max) and (avg)
> > values that are way off, which causes the rest of the graph to be incorrect.
> >
> >
>
>
> Have you tried running hobbitd_rrd with the "--debug" option?? Add it to
> the various hobbitd_rrd entries in server/etc/hobbitlaunch.cfg. I haven't
> tried it myself, so I don't know how verbose it gets. I seem to recall
> Henrik saying it's OK to just kill hobbitd_rrd processes because they get
> respawned.
>
> I guess the debug output shows up in the rrd-status.log in your Hobbit
> logs directory. Is there anything interesting in that log already?? Or any
> other log??
There wasn't anything useful in any of the logs, besides the usual stuff. I
turned on the --debug option, and here is a sample of the data for one of
the affected machines:
2007-11-30 13:14:07 hobbitd_rrd: Got message 562165
@@status#562165|1196446447.724393|192.168.232.110||danno|disk|1196448247|yellow||yellow|1196053505|0||0||1196446447
2007-11-30 13:14:07 startpos 343968, fillpos 343968, endpos -1
2007-11-30 13:14:07 RRD update param 00: 'rrdupdate'
2007-11-30 13:14:07 RRD update param 01:
'/var/hobbit/data/rrd/danno/disk,dev,odm.rrd'
2007-11-30 13:14:07 RRD update param 02: '-t'
2007-11-30 13:14:07 RRD update param 03: 'pct:used'
2007-11-30 13:14:07 RRD update param 04: '1196446447:0:0'
I'm afraid I don't know how to interpret all of this, unfortunately. I get
that the "param 03" means the graph is showing "percentage [disk space]
used", and that "param 01" means it is updating that specific rrd file. And
I remember that "-t" in "param 02" is some rrdtool flag. But I don't know
what the numbers in "param 04" mean. I assume the first number is the #
seconds since 1970, and the second number is the current value, but I don't
know what the last number means. Also, I'm not sure how to interpret all of
the data in the "@@status" line.
By the way, this excerpt is from a machine that is having the graph display
problems. In this case, the data it is receiving is normal and correct.
I'm waiting for another update when the data is incorrect.