[Xymon] PROC matching failure due to column bloat
cleaver at terabithia.org
cleaver at terabithia.org
Tue Apr 24 23:03:21 CEST 2012
> Peeps
>
> I have both Solaris and Linux servers where a large or long-running
> process
> causes PROC matching to fail. Here are some examples:
>
>
> 7701 1 root Feb 28 S 24 0.0 00:00:00 0.0 572 2692
> /sbin/agetty -L 9600 ttyS0 vt102
> 7702 1 root Feb 28 S 23 0.0 00:00:00 0.0 576 2692
> /sbin/agetty -L 9600 ttyS1 vt102
> 7704 1 named Feb 28 S 18 2.4 1-08:59:39 4.4 270500 412784
> /usr/sbin/named -u named -f
> 26498 3293 root 12:47:46 S 14 0.0 00:00:00 0.0 468 2676 sleep
> 180
>
> This is on Linux. Note the longer-than-a-day TIME column that pushes the
> columns after it to the right.
>
> The following is on Solaris 9:
>
> 11201 11199 n101649 12:38:54 S 59 0.0 0:00 0.0 1000 1144 vmstat
> 300 2
> 11202 1 n101649 12:38:54 S 59 0.0 0:00 0.0 968 1104 sh
> -c iostat -dxsrP 300 2 1>/tmp/xymon_iostatdisk.redacted
> 3244 2965 root Feb_16 S 59 0.0 5:20 0.1 7104 18736
> /opt/OV/lbin/eaagt/opcle -std
> 3245 2965 root Feb_16 S 59 0.0 1:18 0.1 6376 20960
> /opt/OV/lbin/eaagt/opcmona
> 3253 1 root Feb_16 S 59 0.9 1-10:46:45 0.8 58168 59632
> /usr/local/sbin/named -f
>
> Solaris "ps" output allows more characters for TIME than Linux. However
> in
> this case the memory columns (RSS and VSZ) are larger than expected,
> pushing a couple of digits over into the process name area.
>
> It seems that Xymon is parsing these based on fixed column sizes, defined
> for each OS. The result of these particular examples is that Xymon fails
> to match on the process name. Instead, I need to use match strings like
> so:
>
> PROC "%^(\d* |^)/usr/local/sbin/named(\s*$|\s)" 1 1
> "TEXT=/usr/local/sbin/named"
>
> or
>
> PROC "%^(\d* |^)/usr/sbin/named(\s*$|\s)" 1 1
> "TEXT=/usr/sbin/named"
>
> I guess this email is part "am I doing something wrong", part "does anyone
> have a better idea", and part feature request (for more awk-like
> positional
> matching).
>
> Cheers
> Jeremy
Close... AFAIK, it's actually looking for the proper column name in a
given listing (from the first line), and then keying off of that. When the
columns don't line up with the given header, xymond_client examines the
wrong substring. So it's dynamic and static :)
see: xymon-4.3.7/xymond/client/solaris.c:66
unix_procs_report(hostname, clienttype, os, hinfo, fromline, timestr,
"CMD", "COMMAND", psstr);
and xymond_client.c:958 onward
On the SunOS box I've got, the ps command (xymonclient-sunos.sh) is
providing the following field list, and below that is some of the output
it gets wrong.
I suppose one quick fix if you're getting this a lot might be to manually
change the order of the fields in the client script to "ps -A -o
args,pid,ppid,user,stime,s,pri,pcpu,time,pmem,rss,vsz"
I'm not sure if other processing is going on, but the only drawback might
be slightly odd-looking ps output in your client logs.
HTH,
-jc
-bash-3.2$ ps -A -o pid,ppid,user,stime,s,pri,pcpu,time,pmem,rss,vsz,args
| head -1
PID PPID USER STIME S PRI %CPU TIME %MEM RSS VSZ COMMAND
-bash-3.2$ ps -A -o pid,ppid,user,stime,s,pri,pcpu,time,pmem,rss,vsz,args
| sort -k8 -r | head
693 666 root Mar_28 S 59 0.1 1-12:12:45 0.0 2720 3424
dovecot-auth -w
160 1 root Mar_28 S 59 0.2 1-04:06:09 0.1 51552 78848
/usr/sbin/nscd
3 0 root Mar_28 S 60 0.1 15:09:32 0.0 0 0 fsflush
482 1 root Mar_28 S 59 0.0 10:10:31 0.1 94008 101432
/opt/local/bin/python /usr/local/sbin/denyhosts.py --daemon
--config=/usr/share
6 0 root Mar_28 S 0 0.1 08:18:46 0.0 0 0 vmtasks
461 1 root Mar_28 S 60 0.0 04:19:34 0.0 1744 3048
/usr/lib/nfs/nfsd
95 0 root Mar_28 S 99 0.0 04:02:11 0.0 0 0
zpool-pool
746 666 dovecot Mar_28 S 59 0.0 02:33:29 0.0 11376 12584
pop3-login
1183 666 root Mar_28 S 59 0.0 02:25:22 0.0 2736 3424
dovecot-auth -w
666 1 root Mar_28 S 59 0.0 02:05:38 0.0 2304 3464
/usr/local/sbin/dovecot
More information about the Xymon
mailing list