[Xymon] PROCS not monitored correctly -what is wrong with this picture?? (Find the error!) (LONG)

Tue Apr 9 20:19:34 CEST 2013

My apologies to anyone who gets this on a digest
I'm having some odd issues with procs on a database cluster and I would be
very grateful for any clues!

We have multiple rule sets to cut down on multiple alerts for shared disk,
and because some have more capacity

db0-db6 are Solaris 10 servers (there is no db8) running xymon 4.3.10
db11-db62 are RHEL Linux running a mix of xymon 4.3.10 and 4.3.7
server is a RHEL VM running xymon 4.3.10

Procs have been screwey for a while, but we only noticed when we added rngd
testing to the linux boxes

db1, db2, db3, db6 are alerting for missing rngd . db0, db4 and db5 are NOT
(this is particularly puzzling since db5&db6 share a rule set, as do db0
and db1. I'd expect them to go as pairs.

db1, db2, db3, and db6 all show:

[image: green] cron (found 1, req. 1 or more)[image: green] nscd
(found 1, req. 1 or more)[image: green] xntpd (found 1, req. 1 or
more)[image: green] cron (found 1, req. 1 or more)[image: green] nscd
(found 1, req. 1 or more)[image: green] ntpd (found 1, req. 1 or
more)[image: yellow] rngd (found 0, req. 1 or more)[image: green] cron
(found 1, req. 1 or more)

db0, db4 and db5 all show:

[image: green] cron (found 1, req. 1 or more)[image: green] nscd
(found 1, req. 1 or more)[image: green] xntpd (found 1, req. 1 or
more)[image: green] cron (found 1, req. 1 or more)

That last cron comes from a HOST=* rule atthe  end.

Can you find what is wrong with the lines below?
#-------------------#Database Servers
HOST=%^db[0|1].example.com
        MEMPHYS 100 101
        MEMSWAP 85 95
        PROC cron 1 -1 yellow
        PROC nscd 1 -1 yellow
        PROC xntpd 1 -1 yellow
#        PROC sar -1 4 yellow
        LOAD 80.0 120.0
        DISK /oracle/dba_msc_nfs2 101 101
        DISK /oracle/data09 99 99
        DISK /oracle/data10 97 98
        DISK /oracle/data17 96 97
        DISK /oracle/data22 98 99
        DISK %.*archivelogs.* 90 95
        DISK %.*redologs.* 90 95
        DISK %.*data.* 95 96
        LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow

HOST=%^db[2|3|4|8].example.com
        UP 30m 9999d
        MEMPHYS 100 101
        MEMSWAP 85 95
#       LOAD    48.0 64.0
        LOAD 80.0 120.0
        PROC cron 1 -1 yellow
        PROC nscd 1 -1 yellow
        PROC xntpd 1 -1 yellow
#        PROC sar -1 4 yellow
        DISK %.*data.* IGNORE
        DISK %.*oracle.* IGNORE
        DISK %.*redologs.* IGNORE
        DISK %.* 80 90
        LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow

HOST=%^db[5|6].example.com
        UP 30m 9999d
        MEMPHYS 100 101
        MEMSWAP 85 95
        LOAD    160.0 240.0
        DISK /oracle/export02 90 95
        DISK %.*oracle.* IGNORE
        DISK %.* 80 90
        PROC cron 1 -1 yellow
        PROC nscd 1 -1 yellow
        PROC xntpd 1 -1 yellow
#        PROC sar -1 4 yellow
        LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow

HOST=%^db[11|12|13|21|22|23|31|32|33|61|62].bo3.*
        UP 30m 9999d
        MEMPHYS 100 101
        MEMSWAP 85 95
        LOAD    64.0 128.0
        PROC cron 1 -1 yellow
        PROC nscd 1 -1 yellow
        PROC ntpd 1 -1 yellow
        PROC rngd 1 -1 yellow

(there are usually more LOG lines but I removed them for clarity, and
verified that the errors persist)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20130409/f276458d/attachment.html>