[Xymon] PROC matching failure due to column bloat

Gore, David W (David) david.gore at verizon.com
Tue Apr 24 20:39:04 CEST 2012


Jeremy,

I would guess it's not matching because what you show begins with a space and not a number or digit.  Could you try simplifying the line to:

PROC "%named" 1 1 "TEXT=/usr/local/sbin/named"

See if that works and then add complexity if there are other processes with the string 'named' in the ps listing.

You may also want to look at the 'Client data' link on the procs alert for this host and then the [ps] section to see how your ps listing is being presented to Xymon as it may not match what you see at the command line depending on what command line ps options you are using.

~David

From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman
Sent: Monday, April 23, 2012 23:04
To: xymon at xymon.com
Subject: [Xymon] PROC matching failure due to column bloat

Peeps

I have both Solaris and Linux servers where a large or long-running process causes PROC matching to fail.  Here are some examples:



 7701     1 root       Feb 28 S  24  0.0 00:00:00  0.0   572   2692 /sbin/agetty -L 9600 ttyS0 vt102
 7702     1 root       Feb 28 S  23  0.0 00:00:00  0.0   576   2692 /sbin/agetty -L 9600 ttyS1 vt102
 7704     1 named      Feb 28 S  18  2.4 1-08:59:39  4.4 270500 412784 /usr/sbin/named -u named -f
26498  3293 root     12:47:46 S  14  0.0 00:00:00  0.0   468   2676 sleep 180
This is on Linux.  Note the longer-than-a-day TIME column that pushes the columns after it to the right.

The following is on Solaris 9:

11201 11199  n101649 12:38:54 S  59  0.0        0:00  0.0 1000 1144 vmstat 300 2
11202     1  n101649 12:38:54 S  59  0.0        0:00  0.0  968 1104 sh -c iostat -dxsrP 300 2 1>/tmp/xymon_iostatdisk.redacted
 3244  2965     root   Feb_16 S  59  0.0        5:20  0.1 7104 18736 /opt/OV/lbin/eaagt/opcle -std
 3245  2965     root   Feb_16 S  59  0.0        1:18  0.1 6376 20960 /opt/OV/lbin/eaagt/opcmona
 3253     1     root   Feb_16 S  59  0.9  1-10:46:45  0.8 58168 59632 /usr/local/sbin/named -f
Solaris "ps" output allows more characters for TIME than Linux.  However in this case the memory columns (RSS and VSZ) are larger than expected, pushing a couple of digits over into the process name area.

It seems that Xymon is parsing these based on fixed column sizes, defined for each OS.  The result of these particular examples is that Xymon fails to match on the process name.  Instead, I need to use match strings like so:

        PROC "%^(\d* |^)/usr/local/sbin/named(\s*$|\s)" 1 1 "TEXT=/usr/local/sbin/named"

or

        PROC "%^(\d* |^)/usr/sbin/named(\s*$|\s)" 1 1 "TEXT=/usr/sbin/named"

I guess this email is part "am I doing something wrong", part "does anyone have a better idea", and part feature request (for more awk-like positional matching).

Cheers
Jeremy




More information about the Xymon mailing list