[hobbit] strange graph behavior - random machines & graphs

Hubbard, Greg L greg.hubbard at eds.com
Fri Nov 30 20:09:18 CET 2007

You know what -- it almost looks like you are getting a timestamp where
another data value is suspected.  It could be that the client is not
sending data reliably, and the field positions are off by one?


	From: Gary Baluha [mailto:gumby3203 at gmail.com] 
	Sent: Friday, November 30, 2007 12:31 PM
	To: hobbit at hswn.dk
	Subject: Re: [hobbit] strange graph behavior - random machines &
	On Nov 30, 2007 1:15 PM, Hubbard, Greg L <greg.hubbard at eds.com>

		It sounds like you are zeroing in on the problem.  Based
on your other post (and this) it seems that the data is getting logged
okay in the RRD, and that data is being faithfully reproduced by the
graphs.  The problem is that the data itself has unexpected values.  So
whatever is providing that data to the RRD is either faulty, or is in
turn being misled by something else further upstream.

	Yeah, I'm fairly confident now that it is the initial data being
fed into the rrd file that is faulty.  I'm still not sure what the
initial "entry point" of this bad data is, though, nor why it is
happening.  I have a feeling that once I determine where the entry point
is, that will lead me to the "why". 

		I don't remember where you said that this data was
coming from.  I know there can be a problem with "rollovers" when a
signed integer is used as a counter and it grows to the point where the
sign bit flips.  This can cause a big jump in a reading if the software
cannot handle the switch from 2,147,483,647 (hex 7FFFFFF) to the next
value (hex 80000000) which flips the sign bit for a signed 32 bit
integer.  This has been a problem in the SNMP world for YEARS.

	Hrm, that has been something vaguely on my mind.  But I haven't
really thought of that as _the_ reason why, since I don't know why there
would be some sort of data rollover.  We're talking about load average
and disk space usage graphs that are showing invalid data.  I'm also
curious why it would have started all of a sudden, on two separate
machines.  But it does seem more and more like something like an integer
rollover, or similar situation. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071130/2ac59471/attachment.html>

More information about the Xymon mailing list