[hobbit] Re: hobbit_rrd stops working after about 1 hour

Olivier Beau olivier at qalpit.com
Mon Aug 22 22:30:52 CEST 2005


Hi Naeem,

I have over 18000 rrd files being updated every 5 minutes, and havent  
seen any problems with them.
i'm running hobbit on a 2x3Gh compaq server with redhat 3.0


but,
i do have heavy i/o due to hobbitd_rrd, and it is getting a problem  
for me,
i'm planning to add a array card with 256M of cache in 1 or 2 days to  
lower the i/o wait..
i have the feeling that hobbitd_rrd could cause performance issue for  
large site and may not be fully optimized... henrik ?



concerning your problem, i posted this early this month :
"hobbitd just slows down dramaticly, causing bbtest's results  
transmition to take over 250s instead of 20s;
the rrd files aren't being updated anymore and some requests to cgi's  
are saying event is not available..
notifications are being sent though and external scripts don't seem  
to be affected

doing a stop/start of hobbit solved the problem right now."

this happened twice for me; bbtest went yellow, i got called and  
restarted hobbit..


is everything nice and green for your bigbrother server itself   
(bbtest,bbgen,hobbitd) ?
have their timing execution really changed before and after the  
problem ?
do you have any interesting logs ?
are the graphs for the bigbrother server itselft with "holes" ? (or  
the first server in your bb-hosts file)



--
Olivier Beau


Le 22 août 05 à 21:28, Naeem.Maqsud at sybase.com a écrit :

> Well, as nobody has suggested anything to my problem I guess that  
> I'm the
> only one having this issue. I have managed to find the root cause. The
> hobbitd_rrd process was showing to be in "uninterruptible sleep"  
> state most
> of the time with high iowait associated with the CPU it was running  
> on. I
> suspected that the problem may be due to disk IO while updating  
> rrds for
> the 2000 hosts.
> I created a tmpfs filesystem and copied the rrd directory into it.  
> Since
> then (48 hours ago) my rrd graphs have been updating continuously.  
> I do
> however need to write back to disk periodically to avoid loss of  
> data after
> a reboot.
>
> This is OK as a temporary fix but I would like to have a permanent
> solution. I would like to hear from other hobbit users who have  
> more than
> 1000 hosts monitored. What type of servers and disk subsystems are  
> they
> using? Perhaps my problem is to do with RedHat and Dell server  
> combination.
> Perhaps I need to stripe over multiple spindles.
>
> -Naeem
>
>
>
>
>              Naeem
>              Maqsud/SYBASE
>                                                                        
>   To
>              08/18/2005 05:02          hobbit at hswn.dk
>               
> PM                                                         cc
>
>                                                                     
> Subject
>                                        hobbit_rrd stops working after
>                                        about 1 hour
>
>
>
>
>
>
>
>
>
>
> Hi,
>
> I'm testing out hobbit 4.1.1 for possible migration from big  
> brother (with
> bbgen). I suspected scalability issues with BB as my rrd graphs were
> updated intermittently. However, hobbit is exhibiting similar  
> problems.
> After about 1 hr of restarting hobbit, the rrd graphs stop updating  
> except
> for the cpu utilization for the hobbit server itself.
>
> The hobbit server is running RedHat Linux AS 3.0. It has 2 x 2.4  
> GHz Xeon
> processors and 1GB of memory. About 800 servers are sending updates  
> to the
> hobbit server. Another 1200 servers are getting remote tests.
>
> Load average has stayed below 1 most of the time. CPU usage has  
> been low
> with 75% idle. 4 CPUs show up due to hyperthreading and I've  
> noticed that
> after the restart of hobbit server, hobbitd_rrd process stays on  
> CPU3 with
> 100% utilization for the one hour that it is busy.
>
> I hope someone can shed some light on this.
>
> Thanks,
> Naeem
>
>
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>




More information about the Xymon mailing list