[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Resolved: [hobbit] Client reports (only) giving intermittent RRD graphing after 4.2.0 -> 4.3.0-beta2 upgrade



Just to follow up on this. After scouring the archives for a bit longer,
I think this message: http://www.hswn.dk/hobbiton/2008/09/msg00313.html
which indicated it specifically might be rrd caching failing somehow.
Perhaps there are different queues going on for each type of test, and
it was our high volume client reports instead of the custom tests that
were causing problems? Regardless, the hidden option given at
http://www.hswn.dk/hobbiton/2009/02/msg00188.html seemed to fix the
problem. Adding -no-cache to the hobbitd_rrd stanzas for hobbitlaunch.

 

Regards,

J.C.

 

 

________________________________

From: Cleaver, Japheth [mailto:jcleaver (at) soe.sony.com] 
Sent: Monday, July 06, 2009 1:09 PM
To: hobbit (at) hswn.dk
Subject: [hobbit] Client reports (only) giving intermittent RRD graphing
after 4.2.0 -> 4.3.0-beta2 upgrade

 

Hello all,

 

I've got a large testing install of Xymon going on with a couple hundred
hosts reporting back client status via a bbmessage.cgi->bbproxy->hobbitd
proxy over port 80.

 

Things were working great with a stable 4.2.0 running on the hobbit
server and Xymon-4.3.0.b2 on the proxy server. However when I upgraded
the Hobbitd server to 4.3.0.b2, I lost reliable RRD graphing. Client
reports are still coming in and being updated properly from a status
perspective, but it seems the RRD graphers are getting only
intermittently usable data. See attached image for an example.

 

I've got a few manual custom graphs being generated with the NCV
directive, and that stuff is being graphed just fine - it's only the
built-in client reporting that suddenly stopped.

 

I tried clearing out all the RRD files after the upgrade, thinking some
sort of compatibility issue had occurred, but I'm still getting the same
problem. USR2 logging on the hobbit_rrd processes shows nothing obvious,
other than the ubiquitous "illegal attempt to update using time ... when
last update time is ... (minimum one second step)" errors. There seem to
be lots of client reports being logged by RRD, but I'm not certain I'm
reading it properly. Rrdtool dump is showing lots of NaN entries in the
RRD files. Thinking there was a change in memory usage that was
truncating client reports combo-ized by bbproxy, I increased all the
MAXMSG_* variables up to 2048, but that didn't seem to have any effect.

 

If anyone has run into this before or has a suggestion on where to
start, I'd appreciate it. 

 

 

Regards,

Japheth Cleaver