[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] DEVMON stops working every now and then



On Wednesday, 11 November 2009 22:37:56 j.sansford (at) ntlworld.com wrote:
> We have the same problem - I've even got devmon configured under SMF in
> Solaris however it doesn't pick up the fact its crashed as the process is
> still there.

It doesn't crash. As far as I can tell, eventually all the child processes 
lose communication with the master process, but they are all still running, 
just waiting for someone to tell them to do something.

> A quick and dirty workaround we have is to send an alert on the "dm"
> monitor going purple - this allows the on-call engineer to be alerted to
> the fact we are no longer effectively monitoring the network devices and so
> to restart the process!
>
> There must be a better way though...

Devmon has had "goes purple" problems since 0.2.2 beta. I fixed the more 
frequent one before the 0.3.0 release.

Anyway, I've done some work on this, however the only production instance of 
devmon I look at often at present last went purple 9 days ago ...

If you are reproducing more frequently, please have a look at the devmon-devel 
mailing list (or archives[1] once they have updated), I just sent a mail with 
an attached patch (against svn, it may apply to the 0.3.1-beta1, haven't 
tried) that may fix the problem, allow us to narrow it down further, or at 
least eliminate one aspect as the cause.

1. http://sourceforge.net/mailarchive/forum.php?forum_name=devmon-devel

Regards,
Buchan