My apologies to anyone who gets this on a digest<br>I'm having some odd issues with procs on a database cluster and I would be very grateful for any clues!<br><br><br>We have multiple rule sets to cut down on multiple alerts for shared disk, and because some have more capacity<br>
<br>db0-db6 are Solaris 10 servers (there is no db8) running xymon 4.3.10<br>db11-db62 are RHEL Linux running a mix of xymon 4.3.10 and 4.3.7<br>server is a RHEL VM running xymon 4.3.10<br><br>Procs have been screwey for a while, but we only noticed when we added rngd testing to the linux boxes<br>
<br>db1, db2, db3, db6 are alerting for missing rngd . db0, db4 and db5 are NOT<br>(this is particularly puzzling since db5&db6 share a rule set, as do db0 and db1. I'd expect them to go as pairs. <br><br>db1, db2, db3, and db6 all show:<br>
<pre><img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> cron (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> nscd (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> xntpd (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> cron (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> nscd (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> ntpd (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/yellow.gif" alt="yellow" border="0" height="16" width="16"> rngd (found 0, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> cron (found 1, req. 1 or more)<br><br><br><br></pre>db0, db4 and db5 all show: <br><pre><img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> cron (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> nscd (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> xntpd (found 1, req. 1 or more)
<img src="http://xymon.e-dialog.com/xymon/gifs/green.gif" alt="green" border="0" height="16" width="16"> cron (found 1, req. 1 or more)</pre><br>That last cron comes from a HOST=* rule atthe end. <br><br>Can you find what is wrong with the lines below?<br>
#-------------------#Database Servers<br>HOST=%^db[0|1].<a href="http://example.com">example.com</a><br> MEMPHYS 100 101<br> MEMSWAP 85 95<br> PROC cron 1 -1 yellow<br> PROC nscd 1 -1 yellow<br>
PROC xntpd 1 -1 yellow<br># PROC sar -1 4 yellow<br> LOAD 80.0 120.0<br> DISK /oracle/dba_msc_nfs2 101 101<br> DISK /oracle/data09 99 99<br> DISK /oracle/data10 97 98<br> DISK /oracle/data17 96 97<br>
DISK /oracle/data22 98 99<br> DISK %.*archivelogs.* 90 95<br> DISK %.*redologs.* 90 95<br> DISK %.*data.* 95 96<br> LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow<br>
<br>HOST=%^db[2|3|4|8].<a href="http://example.com">example.com</a><br> UP 30m 9999d<br> MEMPHYS 100 101<br> MEMSWAP 85 95<br># LOAD 48.0 64.0<br> LOAD 80.0 120.0<br> PROC cron 1 -1 yellow<br>
PROC nscd 1 -1 yellow<br> PROC xntpd 1 -1 yellow<br># PROC sar -1 4 yellow<br> DISK %.*data.* IGNORE<br> DISK %.*oracle.* IGNORE<br> DISK %.*redologs.* IGNORE<br> DISK %.* 80 90<br>
LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow<br><br><br>HOST=%^db[5|6].<a href="http://example.com">example.com</a><br> UP 30m 9999d<br> MEMPHYS 100 101<br> MEMSWAP 85 95<br>
LOAD 160.0 240.0<br> DISK /oracle/export02 90 95<br> DISK %.*oracle.* IGNORE<br> DISK %.* 80 90<br> PROC cron 1 -1 yellow<br> PROC nscd 1 -1 yellow<br> PROC xntpd 1 -1 yellow<br>
# PROC sar -1 4 yellow<br> LOG /export/home/xymon/client/tmp/powermt.out %degraded COLOR=yellow<br> <br>HOST=%^db[11|12|13|21|22|23|31|32|33|61|62].bo3.*<br> UP 30m 9999d<br> MEMPHYS 100 101<br>
MEMSWAP 85 95<br> LOAD 64.0 128.0<br> PROC cron 1 -1 yellow<br> PROC nscd 1 -1 yellow<br> PROC ntpd 1 -1 yellow<br> PROC rngd 1 -1 yellow<br> <br><br>(there are usually more LOG lines but I removed them for clarity, and verified that the errors persist)<br>
<br>