[hobbit] Graphs stop update 24 hours after client reboot; start again 24 hours later - quick fix

Brand, Thomas R. TRBrand at cvs.com
Thu Oct 1 17:39:54 CEST 2009



> -----Original Message-----
> From: Patrik Nilsson [mailto:patrik at jalbum.net]
> Sent: Wednesday, September 30, 2009 9:51 AM
> To: hobbit at hswn.dk
> Subject: Re: [hobbit] Graphs stop update 24 hours after client reboot;
> start again 24 hours later.
> 
> Returning to this old thread as I ran into this issue today.
> 
> Wed, 28 Jan 2009 12:23:17 +0000 (UTC), Henrik wrote:
> >"Brand, Thomas R." <TRBrand (at) cvs.com> writes:
> >>I need some help/suggestions to figure out why my "cpu load" and
"users
> >>& processes" graphs stop updating about 24 hours after the systems
> >>reboot. The updates stop for anywhere from 12 to 24 hours, then
simply
> >>start back up again.
> >>Only the "CPU load" and the "Users and Processes" graphs are having
the
> >>problem; disk, memory, cpu utilization, network traffic don't miss a
> >>beat.
> 
> >The only explanation I can come up with is that the format of
> >some of the "cpu" status message is different for the first 24 hours
> >after a reboot.
> >Could you send me an example of the cpu status shortly after a
reboot,
> >and one when the graphs are working ?
> >What OS are these boxes ?
> 
> Running openSUSE 11.1 (x86_64).
> 
> Client output that does not update the rrd:
> 
> [top]
> top - 14:39:44 up 1 day,  4:40,  3 users,  load average: 2.42, 2.88,
2.89
> 
> Client output that does update the rrd:
> 
> [top]
> top - 14:42:51 up 40 days,  2:41,  3 users,  load average: 4.19, 3.61,
> 3.10
> 
> The only difference I can see is "day" instead of "days".
> 
> Regards,
> 
> Patrik
> 

Based on Patrik's observation, I tried a few more things and found that
'top' does not appear to be the problem; however, on SuSE Linux 10.x
'uptime' also uses 'day' vs. 'days' and it is this value that causes the
la.rrd graphs to lose the info.

As a quick-fix, I have modified hobbitclient-linux.sh on my SuSE Linux
10.x 
systems as follows; 

echo "[uptime]"
uptime | perl -pe "s/^(.*) day (.*)/\1 days \2/"

The graphs updated on the next polling interval and started displaying
the missing information.

Thanks for pointing out the way Patrik :)

Now if someone can figure out where and what needs to be updated in the
source code -- that's a bit beyond my skills...

Cheers
Tom Brand





More information about the Xymon mailing list