solaris work around for df hang on nfs?

John Glowacki johng at idttechnology.com
Wed Nov 22 18:21:05 CET 2006


I want to add monitoring nfs file systems back into the 4.2 sunos
client, but don't want the whole client to hang if nfs hangs. I changed
the scripting to the df test to kill the hanging df and display "&red -
df $1 check failed" message, if the df does not complete in 3 seconds.
That part seems to work OK. The only thing I think I need is get the
hobbit server to change the "disk" status to red or yellow.

1) What do I need to change to get "disk status" to change color?
2) Is there any down side to what I am trying to do?

I changed df test in hobbitclient-sunos.sh to the following:

echo "[df]"
# Get df header
FSTYPESH=`/bin/df -n -l|awk '{print $3}'|egrep -v
"^proc|^fd|^mntfs|^ctfs|^devfs|^objfs|^nfs"|sort|uniq`
set $FSTYPESH
/bin/df -F $1 -k | head -1
# All of this because Solaris df cannot show multiple fs-types, or
exclude certain fs types.
#FSTYPES=`/bin/df -n -l|awk '{print $3}'|egrep -v
"^proc|^fd|^mntfs|^ctfs|^devfs|^objfs|^nfs"|sort|uniq`
FSTYPES=`/bin/df -n|awk '{print $3}'|egrep -v
"^proc|^fd|^mntfs|^ctfs|^devfs|^objfs"|sort|uniq`
if test "$FSTYPES" = ""; then FSTYPES="ufs"; fi
set $FSTYPES
while test "$1" != ""; do
  ( /bin/df -F $1 -k | grep -v " /var/run" | tail +2 ) & cmdpid=$!
  timeout=3
  (sleep $timeout; echo "&red - df $1 check failed" ; kill -9 $cmdpid
>/dev/null 2>&1) &
  watchdogpid=$!
  wait $cmdpid                    # wait for command
  kill $watchdogpid >/dev/null 2>&1
  shift
done

During a df hang the web page source shows the following:

Filesystem            kbytes    used   avail capacity  Mounted on
<IMG SRC="/hobbit/gifs/red.gif" ALT="red" HEIGHT="16" WIDTH="16"
BORDER=0> - df nfs check failed
/dev/dsk/c0t0d0s0    2052750 1407222  583946    71%    /
/dev/dsk/c0t0d0s3    2052750  184667 1806501    10%    /var
/dev/dsk/c0t1d0s0    5038454 3349374 1638696    68%    /opt2
/dev/dsk/c0t0d0s4    2052750 1757484  233684    89%    /opt
/dev/dsk/c0t0d0s5    1015542  724217  230393    76%    /tmp

The Client data df section shows the following raw data:
[df]
Filesystem            kbytes    used   avail capacity  Mounted on
&red - df nfs check failed
/dev/dsk/c0t0d0s0    2052750 1407222  583946    71%    /
/dev/dsk/c0t0d0s3    2052750  184667 1806501    10%    /var
/dev/dsk/c0t1d0s0    5038454 3349374 1638696    68%    /opt2
/dev/dsk/c0t0d0s4    2052750 1757484  233684    89%    /opt
/dev/dsk/c0t0d0s5    1015542  724217  230393    76%    /tmp

Thanks,
John



More information about the Xymon mailing list