[Xymon] Configuring Devmon for the first time
Buchan Milne
bgmilne at staff.telkomsa.net
Wed Jun 1 16:39:10 CEST 2011
On Tuesday, 31 May 2011 03:24:05 kconnell at ryerson.ca wrote:
> I've had issues with devmon not updating the bb-display and everything
> going purple.
Firstly, I don't think this is Josh's problem, as he didn't have a devmon
process, whereas this behaviour is typically that devmon hangs (but the
process is still running).
If you have different behaviour to the I discuss below, please log a new
tracker item.
The 'hang' issue is covered in this tracker item:
http://sourceforge.net/tracker/?func=detail&aid=2897345&group_id=160720&atid=816977
(Unfortunately, it was logged anonymously, and I have had no feedback on
improvements in devmon svn for this issue, either via the tracker, or the
mails on the mailing list)
Discussion of the issue also occurred on the devmon-support mailing list:
http://sourceforge.net/mailarchive/forum.php?thread_name=201102021424.30555.bgmilne%40staff.telkomsa.net&forum_name=devmon-
support
The status has not changed, my failure logs still die at:
[11-05-05 at 15:54:02] DEBUG: Printing single combo message size 13390
[11-05-05 at 15:54:02] DEBUG: Finished printing single combo message
[11-05-05 at 15:55:42] Fork 3 timed out waiting for data from parent: Timeout at
/usr/share/devmon/modules/dm_snmp.pm line 516, <$__ANONIO__> line 30203.
The printing code is wrapped in an eval'd alarm subroutine which should return
within 10 seconds, and log that the printing had completed or that it had
timed out. Instead, the fork has noticed that it hasn't seen anything from the
'master' process within the poll period for some time 40s later.
The question is, what should be done in this case? Should the forks attempt to
kill the master devmon process?
Anyway, I would be grateful if someone could reproduce this on a different
platform. I currently see this on RHEL5 x86_64 with perl-5.8.8-27.el5. Other
environments have been green since 25 Jan ( since they were upgraded to rev
214:
http://devmon.svn.sourceforge.net/viewvc/devmon?view=revision&revision=214).
> I created a "devmon watchdog" script that's runs every 5 min using lynx
> (txt base html browser) which checks if the status of devmon (shows as dm
> test) on bb-monitor. If its purple then I kill the devmon process and
> start it up again....band-aid solution, but it does the trick.
>
> I no script expert, but can share the bash script if you want/need.
Here is mine, but I am *not* going to add it to svn and the next release
unless I have had some feedback on the changes to prevent this occurring at
all, preferable with the failure logs the script keeps.
I run mine from hobbitlaunch.cfg (the problematic box is still running 4.2.2
for now):
[devmon]
ENVFILE /usr/lib64/hobbit/server/etc/hobbitserver.cfg
CMD /usr/local/bin/restart-devmon-if-purple
INTERVAL 1m
LOGFILE /var/log/hobbit/devmon-restart.log
I have a sudo rule in place to allow the hobbit user to call 'sudo
/etc/init.d/devmon stop'
#!/bin/bash
if [ "$BB" == "" ]
then
echo "This script must be run under a Hobbit or Xymon environment" >&2
echo "e.g. by: bbcmd $0" >&2
exit 1
fi
if [ "$BBDISPLAYS" != "" ]
then
BBDISP=${BBDISPLAYS#,*}
fi
COLOR=$($BB $BBDISP "hobbitdboard host=$HOSTNAME test=dm" | cut -d'|' -f3)
if [ "`id -u`" -eq 0 ]
then
DEVMON="/etc/init.d/devmon"
PKILL="pkill"
else
DEVMON="sudo /etc/init.d/devmon"
PKILL="sudo pkill"
fi
if [ "$COLOR" == "purple" ]
then
LOGSAVE=/var/log/devmon/failures/devmon-failure-`date +%Y-%m-%d-%H:%M:
%S`.log
echo "Devmon is purple, saving last 200 lines of log to $LOGSAVE"
tail -n200 /var/log/devmon/devmon.log > $LOGSAVE
$DEVMON stop
NUM=$(pgrep -u devmon|wc -l)
if [ "$NUM" -ne 0 ]
then
echo "Devmon failed to stop cleanly, terminating manually"
$PKILL -u devmon
sleep 5
fi
NUM=$(pgrep -u devmon|wc -l)
if [ "$NUM" -ne 0 ]
then
echo "Devmon failed to terminate cleanly, killing manually"
$PKILL -9 -u devmon
fi
$DEVMON start
else
[ "$DEBUG" == 1 ] && echo "Devmon isn't purple, it is $COLOR"
fi
Regards,
Buchan
More information about the Xymon
mailing list