[Xymon] Configuring Devmon for the first time

Josh Luthman josh at imaginenetworksllc.com
Wed Jun 1 16:41:29 CEST 2011


Definitely good to see your effort on Devmon, I thought it had since been a
forgotten project.  The effort is much appreciated!

Hopefully someone that see this problem can come forward and help everyone
by testing the SVN!

Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373


On Wed, Jun 1, 2011 at 10:39 AM, Buchan Milne <bgmilne at staff.telkomsa.net>wrote:

> On Tuesday, 31 May 2011 03:24:05 kconnell at ryerson.ca wrote:
> > I've had issues with devmon not updating the bb-display and everything
> > going purple.
>
> Firstly, I don't think this is Josh's problem, as he didn't have a devmon
> process, whereas this behaviour is typically that devmon hangs (but the
> process is still running).
>
> If you have different behaviour to the I discuss below, please log a new
> tracker item.
>
> The 'hang' issue is covered in this tracker item:
>
>
> http://sourceforge.net/tracker/?func=detail&aid=2897345&group_id=160720&atid=816977
>
> (Unfortunately, it was logged anonymously, and I have had no feedback on
> improvements in devmon svn for this issue, either via the tracker, or the
> mails on the mailing list)
>
> Discussion of the issue also occurred on the devmon-support mailing list:
>
>
> http://sourceforge.net/mailarchive/forum.php?thread_name=201102021424.30555.bgmilne%40staff.telkomsa.net&forum_name=devmon-
> support
>
> The status has not changed, my failure logs still die at:
>
> [11-05-05 at 15:54:02] DEBUG: Printing single combo message size 13390
> [11-05-05 at 15:54:02] DEBUG: Finished printing single combo message
> [11-05-05 at 15:55:42] Fork 3 timed out waiting for data from parent: Timeout
> at
> /usr/share/devmon/modules/dm_snmp.pm line 516, <$__ANONIO__> line 30203.
>
> The printing code is wrapped in an eval'd alarm subroutine which should
> return
> within 10 seconds, and log that the printing had completed or that it had
> timed out. Instead, the fork has noticed that it hasn't seen anything from
> the
> 'master' process within the poll period for some time 40s later.
>
> The question is, what should be done in this case? Should the forks attempt
> to
> kill the master devmon process?
>
> Anyway, I would be grateful if someone could reproduce this on a different
> platform. I currently see this on RHEL5 x86_64 with perl-5.8.8-27.el5.
> Other
> environments have been green since 25 Jan ( since they were upgraded to rev
> 214:
> http://devmon.svn.sourceforge.net/viewvc/devmon?view=revision&revision=214
> ).
>
> > I created a "devmon watchdog" script that's runs every 5 min using lynx
> > (txt base html browser) which checks if the status of devmon (shows as dm
> > test) on bb-monitor. If its purple then I kill the devmon process and
> > start it up again....band-aid solution, but it does the trick.
> >
> > I no script expert, but can share the bash script if you want/need.
>
> Here is mine, but I am *not* going to add it to svn and the next release
> unless I have had some feedback on the changes to prevent this occurring at
> all, preferable with the failure logs the script keeps.
>
> I run mine from hobbitlaunch.cfg (the problematic box is still running
> 4.2.2
> for now):
>
> [devmon]
>        ENVFILE /usr/lib64/hobbit/server/etc/hobbitserver.cfg
>        CMD /usr/local/bin/restart-devmon-if-purple
>        INTERVAL 1m
>        LOGFILE /var/log/hobbit/devmon-restart.log
>
> I have a sudo rule in place to allow the hobbit user to call 'sudo
> /etc/init.d/devmon stop'
>
>
> #!/bin/bash
> if [ "$BB" == "" ]
> then
>        echo "This script must be run under a Hobbit or Xymon environment"
> >&2
>        echo "e.g. by: bbcmd $0" >&2
>        exit 1
> fi
> if [ "$BBDISPLAYS" != "" ]
> then
>        BBDISP=${BBDISPLAYS#,*}
> fi
> COLOR=$($BB $BBDISP "hobbitdboard host=$HOSTNAME test=dm" | cut -d'|' -f3)
>
> if [ "`id -u`" -eq 0 ]
> then
>        DEVMON="/etc/init.d/devmon"
>        PKILL="pkill"
> else
>        DEVMON="sudo /etc/init.d/devmon"
>        PKILL="sudo pkill"
> fi
>
> if [ "$COLOR" == "purple" ]
> then
>        LOGSAVE=/var/log/devmon/failures/devmon-failure-`date
> +%Y-%m-%d-%H:%M:
> %S`.log
>        echo "Devmon is purple, saving last 200 lines of log to $LOGSAVE"
>        tail -n200 /var/log/devmon/devmon.log > $LOGSAVE
>        $DEVMON stop
>        NUM=$(pgrep -u devmon|wc -l)
>        if [ "$NUM" -ne 0 ]
>        then
>                echo "Devmon failed to stop cleanly, terminating manually"
>                $PKILL -u devmon
>                sleep 5
>        fi
>        NUM=$(pgrep -u devmon|wc -l)
>        if [ "$NUM" -ne 0 ]
>        then
>                echo "Devmon failed to terminate cleanly, killing manually"
>                $PKILL -9 -u devmon
>        fi
>        $DEVMON start
> else
>        [ "$DEBUG" == 1 ] && echo "Devmon isn't purple, it is $COLOR"
> fi
>
>
>
> Regards,
> Buchan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20110601/d1488f63/attachment.html>


More information about the Xymon mailing list