[Xymon] PROC matching failure due to column bloat
Gore, David W (David)
david.gore at verizon.com
Tue Apr 24 20:39:04 CEST 2012
Jeremy,
I would guess it's not matching because what you show begins with a space and not a number or digit. Could you try simplifying the line to:
PROC "%named" 1 1 "TEXT=/usr/local/sbin/named"
See if that works and then add complexity if there are other processes with the string 'named' in the ps listing.
You may also want to look at the 'Client data' link on the procs alert for this host and then the [ps] section to see how your ps listing is being presented to Xymon as it may not match what you see at the command line depending on what command line ps options you are using.
~David
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman
Sent: Monday, April 23, 2012 23:04
To: xymon at xymon.com
Subject: [Xymon] PROC matching failure due to column bloat
Peeps
I have both Solaris and Linux servers where a large or long-running process causes PROC matching to fail. Here are some examples:
7701 1 root Feb 28 S 24 0.0 00:00:00 0.0 572 2692 /sbin/agetty -L 9600 ttyS0 vt102
7702 1 root Feb 28 S 23 0.0 00:00:00 0.0 576 2692 /sbin/agetty -L 9600 ttyS1 vt102
7704 1 named Feb 28 S 18 2.4 1-08:59:39 4.4 270500 412784 /usr/sbin/named -u named -f
26498 3293 root 12:47:46 S 14 0.0 00:00:00 0.0 468 2676 sleep 180
This is on Linux. Note the longer-than-a-day TIME column that pushes the columns after it to the right.
The following is on Solaris 9:
11201 11199 n101649 12:38:54 S 59 0.0 0:00 0.0 1000 1144 vmstat 300 2
11202 1 n101649 12:38:54 S 59 0.0 0:00 0.0 968 1104 sh -c iostat -dxsrP 300 2 1>/tmp/xymon_iostatdisk.redacted
3244 2965 root Feb_16 S 59 0.0 5:20 0.1 7104 18736 /opt/OV/lbin/eaagt/opcle -std
3245 2965 root Feb_16 S 59 0.0 1:18 0.1 6376 20960 /opt/OV/lbin/eaagt/opcmona
3253 1 root Feb_16 S 59 0.9 1-10:46:45 0.8 58168 59632 /usr/local/sbin/named -f
Solaris "ps" output allows more characters for TIME than Linux. However in this case the memory columns (RSS and VSZ) are larger than expected, pushing a couple of digits over into the process name area.
It seems that Xymon is parsing these based on fixed column sizes, defined for each OS. The result of these particular examples is that Xymon fails to match on the process name. Instead, I need to use match strings like so:
PROC "%^(\d* |^)/usr/local/sbin/named(\s*$|\s)" 1 1 "TEXT=/usr/local/sbin/named"
or
PROC "%^(\d* |^)/usr/sbin/named(\s*$|\s)" 1 1 "TEXT=/usr/sbin/named"
I guess this email is part "am I doing something wrong", part "does anyone have a better idea", and part feature request (for more awk-like positional matching).
Cheers
Jeremy
More information about the Xymon
mailing list