[Xymon] PROCS not monitored correctly -what is wrong with this picture?? (Find the error!) (LONG)
Betsy Schwartz
betsy.schwartz at gmail.com
Tue Apr 9 20:19:34 CEST 2013
My apologies to anyone who gets this on a digest
I'm having some odd issues with procs on a database cluster and I would be
very grateful for any clues!
We have multiple rule sets to cut down on multiple alerts for shared disk,
and because some have more capacity
db0-db6 are Solaris 10 servers (there is no db8) running xymon 4.3.10
db11-db62 are RHEL Linux running a mix of xymon 4.3.10 and 4.3.7
server is a RHEL VM running xymon 4.3.10
Procs have been screwey for a while, but we only noticed when we added rngd
testing to the linux boxes
db1, db2, db3, db6 are alerting for missing rngd . db0, db4 and db5 are NOT
(this is particularly puzzling since db5&db6 share a rule set, as do db0
and db1. I'd expect them to go as pairs.
db1, db2, db3, and db6 all show:
[image: green] cron (found 1, req. 1 or more)[image: green] nscd
(found 1, req. 1 or more)[image: green] xntpd (found 1, req. 1 or
more)[image: green] cron (found 1, req. 1 or more)[image: green] nscd
(found 1, req. 1 or more)[image: green] ntpd (found 1, req. 1 or
more)[image: yellow] rngd (found 0, req. 1 or more)[image: green] cron
(found 1, req. 1 or more)
db0, db4 and db5 all show:
[image: green] cron (found 1, req. 1 or more)[image: green] nscd
(found 1, req. 1 or more)[image: green] xntpd (found 1, req. 1 or
more)[image: green] cron (found 1, req. 1 or more)
That last cron comes from a HOST=* rule atthe end.
Can you find what is wrong with the lines below?
#-------------------#Database Servers
HOST=%^db[0|1].example.com
MEMPHYS 100 101
MEMSWAP 85 95
PROC cron 1 -1 yellow
PROC nscd 1 -1 yellow
PROC xntpd 1 -1 yellow
# PROC sar -1 4 yellow
LOAD 80.0 120.0
DISK /oracle/dba_msc_nfs2 101 101
DISK /oracle/data09 99 99
DISK /oracle/data10 97 98
DISK /oracle/data17 96 97
DISK /oracle/data22 98 99
DISK %.*archivelogs.* 90 95
DISK %.*redologs.* 90 95
DISK %.*data.* 95 96
LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow
HOST=%^db[2|3|4|8].example.com
UP 30m 9999d
MEMPHYS 100 101
MEMSWAP 85 95
# LOAD 48.0 64.0
LOAD 80.0 120.0
PROC cron 1 -1 yellow
PROC nscd 1 -1 yellow
PROC xntpd 1 -1 yellow
# PROC sar -1 4 yellow
DISK %.*data.* IGNORE
DISK %.*oracle.* IGNORE
DISK %.*redologs.* IGNORE
DISK %.* 80 90
LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow
HOST=%^db[5|6].example.com
UP 30m 9999d
MEMPHYS 100 101
MEMSWAP 85 95
LOAD 160.0 240.0
DISK /oracle/export02 90 95
DISK %.*oracle.* IGNORE
DISK %.* 80 90
PROC cron 1 -1 yellow
PROC nscd 1 -1 yellow
PROC xntpd 1 -1 yellow
# PROC sar -1 4 yellow
LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow
HOST=%^db[11|12|13|21|22|23|31|32|33|61|62].bo3.*
UP 30m 9999d
MEMPHYS 100 101
MEMSWAP 85 95
LOAD 64.0 128.0
PROC cron 1 -1 yellow
PROC nscd 1 -1 yellow
PROC ntpd 1 -1 yellow
PROC rngd 1 -1 yellow
(there are usually more LOG lines but I removed them for clarity, and
verified that the errors persist)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20130409/f276458d/attachment.html>
More information about the Xymon
mailing list