[Xymon] Configuring Devmon for the first time

Buchan Milne bgmilne at staff.telkomsa.net
Wed Jun 1 16:39:10 CEST 2011


On Tuesday, 31 May 2011 03:24:05 kconnell at ryerson.ca wrote:
> I've had issues with devmon not updating the bb-display and everything
> going purple.

Firstly, I don't think this is Josh's problem, as he didn't have a devmon 
process, whereas this behaviour is typically that devmon hangs (but the 
process is still running).

If you have different behaviour to the I discuss below, please log a new 
tracker item.

The 'hang' issue is covered in this tracker item:

http://sourceforge.net/tracker/?func=detail&aid=2897345&group_id=160720&atid=816977

(Unfortunately, it was logged anonymously, and I have had no feedback on 
improvements in devmon svn for this issue, either via the tracker, or the 
mails on the mailing list)

Discussion of the issue also occurred on the devmon-support mailing list:

http://sourceforge.net/mailarchive/forum.php?thread_name=201102021424.30555.bgmilne%40staff.telkomsa.net&forum_name=devmon-
support

The status has not changed, my failure logs still die at:

[11-05-05 at 15:54:02] DEBUG: Printing single combo message size 13390
[11-05-05 at 15:54:02] DEBUG: Finished printing single combo message
[11-05-05 at 15:55:42] Fork 3 timed out waiting for data from parent: Timeout at 
/usr/share/devmon/modules/dm_snmp.pm line 516, <$__ANONIO__> line 30203.

The printing code is wrapped in an eval'd alarm subroutine which should return 
within 10 seconds, and log that the printing had completed or that it had 
timed out. Instead, the fork has noticed that it hasn't seen anything from the 
'master' process within the poll period for some time 40s later.

The question is, what should be done in this case? Should the forks attempt to 
kill the master devmon process?

Anyway, I would be grateful if someone could reproduce this on a different 
platform. I currently see this on RHEL5 x86_64 with perl-5.8.8-27.el5. Other 
environments have been green since 25 Jan ( since they were upgraded to rev 
214: 
http://devmon.svn.sourceforge.net/viewvc/devmon?view=revision&revision=214).

> I created a "devmon watchdog" script that's runs every 5 min using lynx
> (txt base html browser) which checks if the status of devmon (shows as dm
> test) on bb-monitor. If its purple then I kill the devmon process and
> start it up again....band-aid solution, but it does the trick.
> 
> I no script expert, but can share the bash script if you want/need.

Here is mine, but I am *not* going to add it to svn and the next release 
unless I have had some feedback on the changes to prevent this occurring at 
all, preferable with the failure logs the script keeps.

I run mine from hobbitlaunch.cfg (the problematic box is still running 4.2.2 
for now):

[devmon]
        ENVFILE /usr/lib64/hobbit/server/etc/hobbitserver.cfg
        CMD /usr/local/bin/restart-devmon-if-purple
        INTERVAL 1m
        LOGFILE /var/log/hobbit/devmon-restart.log

I have a sudo rule in place to allow the hobbit user to call 'sudo 
/etc/init.d/devmon stop'


#!/bin/bash
if [ "$BB" == "" ]
then
        echo "This script must be run under a Hobbit or Xymon environment" >&2
        echo "e.g. by: bbcmd $0" >&2
        exit 1
fi
if [ "$BBDISPLAYS" != "" ]
then
        BBDISP=${BBDISPLAYS#,*}
fi
COLOR=$($BB $BBDISP "hobbitdboard host=$HOSTNAME test=dm" | cut -d'|' -f3)

if [ "`id -u`" -eq 0 ]
then
        DEVMON="/etc/init.d/devmon"
        PKILL="pkill"
else
        DEVMON="sudo /etc/init.d/devmon"
        PKILL="sudo pkill"
fi

if [ "$COLOR" == "purple" ]
then
        LOGSAVE=/var/log/devmon/failures/devmon-failure-`date +%Y-%m-%d-%H:%M:
%S`.log
        echo "Devmon is purple, saving last 200 lines of log to $LOGSAVE"
        tail -n200 /var/log/devmon/devmon.log > $LOGSAVE
        $DEVMON stop
        NUM=$(pgrep -u devmon|wc -l)
        if [ "$NUM" -ne 0 ]
        then 
                echo "Devmon failed to stop cleanly, terminating manually"
                $PKILL -u devmon
                sleep 5
        fi
        NUM=$(pgrep -u devmon|wc -l)
        if [ "$NUM" -ne 0 ]
        then 
                echo "Devmon failed to terminate cleanly, killing manually"
                $PKILL -9 -u devmon
        fi
        $DEVMON start
else
        [ "$DEBUG" == 1 ] && echo "Devmon isn't purple, it is $COLOR"
fi



Regards,
Buchan



More information about the Xymon mailing list