[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [hobbit] DEVMON stops working every now and then



I'm assuming when DEVMON quits working everything goes purple, if this
is the case .... someone posted this awhile back as a work around.. It
works PERFECTLY and I haven't had to touch DEVMON since.

 

>> hobbit-alerts.cfg

## !-- RESTART DEVMON on PURPLE

 

HOST=NOC COLOR=purple SERVICE=dm

  SCRIPT /usr/local/hobbit/server/ext/restart-devmon.sh 1234567890

<< 

 

>> restart-devmon.sh

#!/bin/sh

# Custom Script to Restart DEVMON on Purple

#

ps -ax | grep devm | grep perl|awk '{print $1}' | xargs kill

 

sleep 60

 

ps -ax | grep devm | grep perl|awk '{print $1}' | xargs kill

 

sleep 10

 

/usr/local/hobbit-devmon/devmon

<< 

 

 

-Clint

 

From: Gregory Thomas [mailto:GThomas (at) fairdinkum.com] 
Sent: 11. novembra 2009 10:24
To: 'hobbit (at) hswn.dk'
Subject: RE: [hobbit] DEVMON stops working every now and then

 

I've got the same problem. Just had to restart after having it working
for about 48 hours.

 

I have added devmon (0.3.1-beta1) to the mix only a few weeks ago and am
running it on ubuntu (desktop 8.10) along with xymon 4.2.3 (running
about 6 months). On a side note, the rrd graphing works quite well for
connects, cpu, if_load, and memory.

 

to kill it I run "sudo killall devmon" and it goes from purple to green
again without running anything else.

 

To get devmon running in the first place I've added the following to
hobbitlaunch.cfg: (I'm not sure this is the "proper" way to handle and
almost seems to too easy but it starts when I start xymon.)

 

hobbitlaunch.cfg

...

[devmon]
 CMD $BBHOME/ext/devmon/devmon

 

[devmonreload]
 CMD $BBHOME/ext/devmon/devmon --readbbhosts
 INTERVAL 5m

...

I've seen others post that they have cron jobs daily or even more often
to restart devmon but I wish that wasn't required.


Greg

 

________________________________

From: thorsten.erdmann (at) daimler.com [mailto:thorsten.erdmann (at) daimler.com]

Sent: Wednesday, November 11, 2009 8:58 AM
To: hobbit (at) hswn.dk
Subject: [hobbit] DEVMON stops working every now and then


Hello

some time ago I already talked about devmon stops working when a
monitored device ist not responding. Now I saw it has nothing to do with
non responsive devices.
Devmon stops working at irregular intervals. I set Devmon to verbose and
looked at the devmon log. I saw that there are simply no more messages
when it stops working (see below). No error messages - nothing. None in
the devmon log nor in the syslog.

If I do a "ps -ef" I see all devmon processes running:

[root (at) s068a300 devmon]# ps -ef |grep devmon
hobbit   10211     1  0 Nov09 ?        00:10:07 devmon[master]
hobbit   10214 10211  0 Nov09 ?        00:00:22 devmon
hobbit   10215 10211  0 Nov09 ?        00:00:21 devmon
hobbit   10217 10211  0 Nov09 ?        00:00:22 devmon
hobbit   10218 10211  0 Nov09 ?        00:01:52 devmon
hobbit   10219 10211  0 Nov09 ?        00:00:21 devmon
hobbit   10220 10211  0 Nov09 ?        00:01:51 devmon
hobbit   10221 10211  0 Nov09 ?        00:01:52 devmon
hobbit   10222 10211  0 Nov09 ?        00:00:00 devmon
hobbit   10223 10211  0 Nov09 ?        00:00:00 devmon
root     20447  3611  0 14:47 pts/1    00:00:00 grep devmon

Any idea how I can find out why devmon stops working and what the
processes do when they are stuck. If I send a SIGTERM to the devmon
master process, it stops all other processe, so it looks it is
responding to signals as it should.

BTW.: has anyone a devmon startup/shutdown script which works on SuSE
EL.

Thorsten Erdmann

Attachement:
Here are the last few lines of the devmon log

[09-11-10 (at) 10:52:21] Performing test logic
[09-11-10 (at) 10:52:21] Done with test logic
[09-11-10 (at) 10:52:21] Sending messages to display server
[09-11-10 (at) 10:52:21] Done sending messages
[09-11-10 (at) 10:52:21] Sleeping for 59 seconds.
[09-11-10 (at) 10:53:20] Starting snmp queries
[09-11-10 (at) 10:53:20] Getting device status from hobbit at localhost:1984
[09-11-10 (at) 10:53:20] Querying u068usv020a1 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:53:20] Querying u068usv020a2 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:53:20] Querying u068usv020b1 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:53:20] Querying u068usv020b2 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:53:20] Querying u068usv110111 for tests power,temperature
[09-11-10 (at) 10:53:20] Querying u068usvnw1111 for tests power,temperature
[09-11-10 (at) 10:53:20] Querying u068usvnw1112 for tests power,temperature
[09-11-10 (at) 10:53:20] Querying u068usvnw1211 for tests power,temperature
[09-11-10 (at) 10:53:21] Performing test logic
[09-11-10 (at) 10:53:21] Done with test logic
[09-11-10 (at) 10:53:21] Sending messages to display server
[09-11-10 (at) 10:53:21] Done sending messages
[09-11-10 (at) 10:53:21] Sleeping for 59 seconds.
[09-11-10 (at) 10:54:20] Starting snmp queries
[09-11-10 (at) 10:54:20] Getting device status from hobbit at localhost:1984
[09-11-10 (at) 10:54:20] Querying u068usv020a1 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:54:21] Querying u068usv020a2 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:54:21] Querying u068usv020b1 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:54:21] Querying u068usv020b2 for tests
battery,powerin,power,diag,temperature,msgs
[09-11-10 (at) 10:54:21] Querying u068usv110111 for tests power,temperature
[09-11-10 (at) 10:54:21] Querying u068usvnw1111 for tests power,temperature
[09-11-10 (at) 10:54:21] Querying u068usvnw1112 for tests power,temperature
[09-11-10 (at) 10:54:21] Querying u068usvnw1211 for tests power,temperature
If you are not the intended addressee, please inform us immediately that
you have received this e-mail in error, and delete it. We thank you for
your cooperation.