[Xymon] PROC matching failure due to column bloat

cleaver at terabithia.org cleaver at terabithia.org
Tue Apr 24 23:03:21 CEST 2012


> Peeps
>
> I have both Solaris and Linux servers where a large or long-running
> process
> causes PROC matching to fail.  Here are some examples:
>
>
>  7701     1 root       Feb 28 S  24  0.0 00:00:00  0.0   572   2692
> /sbin/agetty -L 9600 ttyS0 vt102
>  7702     1 root       Feb 28 S  23  0.0 00:00:00  0.0   576   2692
> /sbin/agetty -L 9600 ttyS1 vt102
>  7704     1 named      Feb 28 S  18  2.4 1-08:59:39  4.4 270500 412784
> /usr/sbin/named -u named -f
> 26498  3293 root     12:47:46 S  14  0.0 00:00:00  0.0   468   2676 sleep
> 180
>
> This is on Linux.  Note the longer-than-a-day TIME column that pushes the
> columns after it to the right.
>
> The following is on Solaris 9:
>
> 11201 11199  n101649 12:38:54 S  59  0.0        0:00  0.0 1000 1144 vmstat
> 300 2
> 11202     1  n101649 12:38:54 S  59  0.0        0:00  0.0  968 1104 sh
> -c iostat -dxsrP 300 2 1>/tmp/xymon_iostatdisk.redacted
>  3244  2965     root   Feb_16 S  59  0.0        5:20  0.1 7104 18736
> /opt/OV/lbin/eaagt/opcle -std
>  3245  2965     root   Feb_16 S  59  0.0        1:18  0.1 6376 20960
> /opt/OV/lbin/eaagt/opcmona
>  3253     1     root   Feb_16 S  59  0.9  1-10:46:45  0.8 58168 59632
> /usr/local/sbin/named -f
>
> Solaris "ps" output allows more characters for TIME than Linux.  However
> in
> this case the memory columns (RSS and VSZ) are larger than expected,
> pushing a couple of digits over into the process name area.
>
> It seems that Xymon is parsing these based on fixed column sizes, defined
> for each OS.  The result of these particular examples is that Xymon fails
> to match on the process name.  Instead, I need to use match strings like
> so:
>
>         PROC "%^(\d* |^)/usr/local/sbin/named(\s*$|\s)" 1 1
> "TEXT=/usr/local/sbin/named"
>
> or
>
>         PROC "%^(\d* |^)/usr/sbin/named(\s*$|\s)" 1 1
> "TEXT=/usr/sbin/named"
>
> I guess this email is part "am I doing something wrong", part "does anyone
> have a better idea", and part feature request (for more awk-like
> positional
> matching).
>
> Cheers
> Jeremy


Close... AFAIK, it's actually looking for the proper column name in a
given listing (from the first line), and then keying off of that. When the
columns don't line up with the given header, xymond_client examines the
wrong substring. So it's dynamic and static :)


see: xymon-4.3.7/xymond/client/solaris.c:66        
unix_procs_report(hostname, clienttype, os, hinfo, fromline, timestr,
"CMD", "COMMAND", psstr);

and xymond_client.c:958 onward



On the SunOS box I've got, the ps command (xymonclient-sunos.sh) is
providing the following field list, and below that is some of the output
it gets wrong.

I suppose one quick fix if you're getting this a lot might be to manually
change the order of the fields in the client script to "ps -A -o
args,pid,ppid,user,stime,s,pri,pcpu,time,pmem,rss,vsz"


I'm not sure if other processing is going on, but the only drawback might
be slightly odd-looking ps output in your client logs.


HTH,
-jc


-bash-3.2$ ps -A -o pid,ppid,user,stime,s,pri,pcpu,time,pmem,rss,vsz,args
| head -1
  PID  PPID     USER    STIME S PRI %CPU        TIME %MEM  RSS  VSZ COMMAND
-bash-3.2$ ps -A -o pid,ppid,user,stime,s,pri,pcpu,time,pmem,rss,vsz,args
| sort -k8 -r | head
  693   666     root   Mar_28 S  59  0.1  1-12:12:45  0.0 2720 3424
dovecot-auth -w
  160     1     root   Mar_28 S  59  0.2  1-04:06:09  0.1 51552 78848
/usr/sbin/nscd
    3     0     root   Mar_28 S  60  0.1    15:09:32  0.0    0    0 fsflush
  482     1     root   Mar_28 S  59  0.0    10:10:31  0.1 94008 101432
/opt/local/bin/python /usr/local/sbin/denyhosts.py --daemon
--config=/usr/share
    6     0     root   Mar_28 S   0  0.1    08:18:46  0.0    0    0 vmtasks
  461     1     root   Mar_28 S  60  0.0    04:19:34  0.0 1744 3048
/usr/lib/nfs/nfsd
   95     0     root   Mar_28 S  99  0.0    04:02:11  0.0    0    0
zpool-pool
  746   666  dovecot   Mar_28 S  59  0.0    02:33:29  0.0 11376 12584
pop3-login
 1183   666     root   Mar_28 S  59  0.0    02:25:22  0.0 2736 3424
dovecot-auth -w
  666     1     root   Mar_28 S  59  0.0    02:05:38  0.0 2304 3464
/usr/local/sbin/dovecot






More information about the Xymon mailing list