[hobbit] Monitoring disk space problems (was: RE: [hobbit] Highlights of the 4.3.0 version)

Buchan Milne bgmilne at staff.telkomsa.net
Wed Aug 8 18:28:30 CEST 2007


On Monday 06 August 2007 21:25:46 Haertig, David F (Dave) wrote:
> I try to identify filesystem "space hogs" via custom scripts I wrote a
> long time ago when using BB.  99% of my custom stuff is done in PERL.
>
> I use 'du -k' to get the size of all directories in the filesystem.  I
> then cut those results down to only the first and second level
> directories (but you could go as deep as you want).  I store the size of
> each subdirectory in a small "database".  I did this ages ago and my
> code uses PERL's "Storable" module to store the accumulated date into a
> file (called my "database").  These days I'd just use Hobbit's easily
> accessed RRD files.  I then use PERL's
> Statistics::Descriptive::least_squares_fit() to calculate the slope and
> linear correlation coefficient of the "best fit line".

This would be really useful to do on directories monitored with the dir option 
in client-local.cfg plus DIR option in hobbit-clients, e.g. to be able to 
specify alerts at specified "time before disk is full".

> This allows me 
> to see how fast each subdirectory is growing/shrinking, and how linear
> that growth/reduction is.  I trigger yellow/red conditions based on rate
> of growth and predicted fill time at current growth rate, in addition to
> the standard "95% full = red" test.
>
> The above makes it fairly easy to identify which subdirectory is your
> problem, which is often times good enough to identify the file/process
> that is killing you.  When that's not, I have a seperate test that tries
> to identify problem files a different way.  BB/Hobbit uses 'top' to
> identify cpu-hogging processes.  Many times you see files hogging space
> are directly tied to processes hogging cpu (runaway process = runaway
> file in many cases).  'top' identifies the process(es), then "lsof -p
> <pid>" is used to identify the files that the suspect process has open.
> Finding a cpu-hogger that has a filespace-hogger open is usually the
> holy grail you seek.

The "CPU usage by process" graph is the utopian one ...

> As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in
> 100Mb chunks for critical filesystems.  "dd if=/dev/zero
> of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then
> "cp reserve01 reserve02", etc. to build up the reserve.

lvextend may be another useful command here ...


Regards,
Buchan



More information about the Xymon mailing list