[hobbit] strange graph behavior - random machines & graphs

Gary Baluha gumby3203 at gmail.com
Fri Nov 30 17:14:41 CET 2007


On Nov 30, 2007 10:53 AM, Hubbard, Greg L <greg.hubbard at eds.com> wrote:

>  Gary,
>
> This is pretty hard to decipher from "afar".
>
> I think I remember you saying that when you dump the data it is always
> okay?
>

Actually, it turns out this is not true.  The rrd file does indeed have the
bad data.  I just didn't notice it before, but now that it appears to be
getting worse, it is quite obvious to see the bad data.

Some wild thoughts:
>
> a) could there be two different processes updating the same RRD files?
>

I don't believe so.  The strange thing is, all of the graphs that become
corrupted have the exact same large number that is being input into the rrd
data files.


> b) are all servers using the same version of rrdtool?
>

No.  One is running 1.2.23, the other 1.2.26.  Both have the problem.


> c) are the hobbitgraph files okay?  I have proven to my satisfaction that
> hobbitgraph definition errors can make the graphs act funny.
>

They haven't changed since before the graphs were having this problem.


> d) if this stuff is on a SAN, can it be moved to local storage?
>

It is on the SAN on one of the machines, and locally on the other.  I was
thinking of temporarily moving the data directory and have Hobbit regenerate
all the data from scratch.  I'm trying to avoid this, since that would mean
losing a year's worth of trend data that has proven itself very useful.
Still, if it helps me narrow down the problem, I'll consider this (and move
the data back once I get my answer).


> I am just "fishing."  Sometimes, when I am at my wit's end, I just change
> SOMETHING to see if it makes a difference. Even WORSE can help get me
> started.
>
> GLH
>
>  ------------------------------
> *From:* Gary Baluha [mailto:gumby3203 at gmail.com]
> *Sent:* Friday, November 30, 2007 9:25 AM
> *To:* hobbit at hswn.dk
> *Subject:* Re: [hobbit] strange graph behavior - random machines & graphs
>
> Now this appears it is becoming a more serious problem.  It seems more and
> more graphs are starting to be affected, and I still have no explanation for
> what is going on here.  It also seems that almost any new graph that is
> created (such as if I delete/rename/move an existing .rrd file), it
> immediately starts off being corrupted. :-(
>
> On Nov 28, 2007 10:08 AM, Gary Baluha <gumby3203 at gmail.com> wrote:
>
> > I have recently noticed a strange thing happening with some of the rrd
> > graphs generated by Hobbit.  When you look at the graph, it looks as though
> > the rrd data is one one format (gauge), but the graph is generating it in a
> > different format (derive).  I can't seem to find any pattern to the hosts or
> > tests that are exhibiting this strange behavior, and it is only happening on
> > a handful of graphs.  I have attached a picture of one of these graphs,
> > since I'm not really sure how to describe it.  Note the huge numbers
> > displayed on the curr/min/avg/max line.
> >
> > Any idea what's going on here?  When I dump the RRD file manually,
> > everything looks okay.  I'm running Hobbit 4.2.0 with the 2007-02-09
> > allinone patch (I believe the latest).  This has only happened in the past
> > few weeks, though when exactly it started, I don't know.  Any ideas?
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071130/d436aebf/attachment.html>


More information about the Xymon mailing list