[hobbit] strange graph behavior - random machines & graphs

Hubbard, Greg L greg.hubbard at eds.com
Fri Nov 30 19:15:51 CET 2007


It sounds like you are zeroing in on the problem.  Based on your other
post (and this) it seems that the data is getting logged okay in the
RRD, and that data is being faithfully reproduced by the graphs.  The
problem is that the data itself has unexpected values.  So whatever is
providing that data to the RRD is either faulty, or is in turn being
misled by something else further upstream.
 
I don't remember where you said that this data was coming from.  I know
there can be a problem with "rollovers" when a signed integer is used as
a counter and it grows to the point where the sign bit flips.  This can
cause a big jump in a reading if the software cannot handle the switch
from 2,147,483,647 (hex 7FFFFFF) to the next value (hex 80000000) which
flips the sign bit for a signed 32 bit integer.  This has been a problem
in the SNMP world for YEARS.
 
There, I knew some of the computer science 101 stuff I learned in the
70's might be useful some day...
 
GLH


________________________________

	From: Gary Baluha [mailto:gumby3203 at gmail.com] 
	Sent: Friday, November 30, 2007 10:15 AM
	To: hobbit at hswn.dk
	Subject: Re: [hobbit] strange graph behavior - random machines &
graphs
	
	
	On Nov 30, 2007 10:53 AM, Hubbard, Greg L <greg.hubbard at eds.com>
wrote:
	

		Gary,
		 
		This is pretty hard to decipher from "afar".
		 
		I think I remember you saying that when you dump the
data it is always okay?


	Actually, it turns out this is not true.  The rrd file does
indeed have the bad data.  I just didn't notice it before, but now that
it appears to be getting worse, it is quite obvious to see the bad data.

	
	

		
		Some wild thoughts:
		 
		a) could there be two different processes updating the
same RRD files?


	I don't believe so.  The strange thing is, all of the graphs
that become corrupted have the exact same large number that is being
input into the rrd data files.
	 

		b) are all servers using the same version of rrdtool? 


	No.  One is running 1.2.23, the other 1.2.26.  Both have the
problem.
	 

		c) are the hobbitgraph files okay?  I have proven to my
satisfaction that hobbitgraph definition errors can make the graphs act
funny.


	They haven't changed since before the graphs were having this
problem.
	 

		d) if this stuff is on a SAN, can it be moved to local
storage?


	It is on the SAN on one of the machines, and locally on the
other.  I was thinking of temporarily moving the data directory and have
Hobbit regenerate all the data from scratch.  I'm trying to avoid this,
since that would mean losing a year's worth of trend data that has
proven itself very useful.  Still, if it helps me narrow down the
problem, I'll consider this (and move the data back once I get my
answer). 
	 

		I am just "fishing."  Sometimes, when I am at my wit's
end, I just change SOMETHING to see if it makes a difference. Even WORSE
can help get me started.
		 
		GLH


________________________________

			
			From: Gary Baluha [mailto:gumby3203 at gmail.com] 
			
			Sent: Friday, November 30, 2007 9:25 AM 

			To: hobbit at hswn.dk
			Subject: Re: [hobbit] strange graph behavior -
random machines & graphs
			

			Now this appears it is becoming a more serious
problem.  It seems more and more graphs are starting to be affected, and
I still have no explanation for what is going on here.  It also seems
that almost any new graph that is created (such as if I
delete/rename/move an existing .rrd file), it immediately starts off
being corrupted. :-( 
			
			
			On Nov 28, 2007 10:08 AM, Gary Baluha
<gumby3203 at gmail.com> wrote:
			

				I have recently noticed a strange thing
happening with some of the rrd graphs generated by Hobbit.  When you
look at the graph, it looks as though the rrd data is one one format
(gauge), but the graph is generating it in a different format (derive).
I can't seem to find any pattern to the hosts or tests that are
exhibiting this strange behavior, and it is only happening on a handful
of graphs.  I have attached a picture of one of these graphs, since I'm
not really sure how to describe it.  Note the huge numbers displayed on
the curr/min/avg/max line. 
				 
				Any idea what's going on here?  When I
dump the RRD file manually, everything looks okay.  I'm running Hobbit
4.2.0 with the 2007-02-09 allinone patch (I believe the latest).  This
has only happened in the past few weeks, though when exactly it started,
I don't know.  Any ideas? 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071130/c4a48f87/attachment.html>


More information about the Xymon mailing list