[Xymon] Xymon PROC check fails on un-aligned ps(1) output

Christoph Schug cs at schug.net
Mon Aug 1 20:32:10 CEST 2011


If have got a question regarding Xymon 4.3.3 (running on CentOS
5.6/x86_64). In order to monitor the existence of certain processes like
rsyslogd(8) I have following process rule defined in analysis.cfg:

CLASS=linux
    PROC     "%^/sbin/rsyslogd -m 0$"

This works fine as long as the columns in the output of ps(1) (more
specific “ps -Aww -o
pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd” as defined in
xymonclient-linux.sh) are all nicely aligned.

  PID  PPID USER      STARTED S PRI %CPU     TIME %MEM   RSZ    VSZ CMD
[...]
 4620  4607 68         Jun 17 S  22  0.0 00:00:00  0.0   860  12348
hald-addon-keyboard: listening on /dev/input/event0
 4709     1 root       Jun 17 S  17  0.0 00:00:00  0.0   496   8540
/usr/bin/hidd --server
 4739     1 root       Jun 17 S  21  0.0 00:11:14  0.0  3576 300132
/sbin/rsyslogd -m 0
 6894     1 root       Jun 17 S  18  0.0 00:00:00  0.0  1540 122008
automount
 6918     1 root       Jun 17 S  24  0.0 00:00:08  0.0  1224  63544
/usr/sbin/sshd

The trouble starts when the process in question runs long enough (as seen
on a different machine) so it does fit the reserved columns for that
specific field, disturbing to whole output (process runtime is just one
example, I suppose any value growing big enough to not fit anymore the
reserved space would do to exploit this behavior):

  PID  PPID USER      STARTED S PRI %CPU     TIME %MEM   RSZ    VSZ CMD
[...]
 5377     1 root       May 24 S  21  0.0 00:00:00  0.0   444   3816
/sbin/mingetty tty4
 5378     1 root       May 24 S  20  0.0 00:00:00  0.0   444   3816
/sbin/mingetty tty5
 5380     1 root       May 24 S  19  0.0 00:00:00  0.0   444   3816
/sbin/mingetty tty6
 5382     1 root       May 24 S  22  0.0 00:00:00  0.0   496   3824
/sbin/agetty 9600 ttyS1 vt100
 8734     1 root       Jun 20 S  21  7.7 3-06:51:29  0.1 48640 292664
/sbin/rsyslogd -m 0
20468   262 root       Jul 19 S  24  0.0 00:04:01  0.0     0      0
[pdflush]

In this case the above regex does not seem to match anymore, because
(apparently) the matching starts at some fixed column value. Just for fun
and to double check I enhanced the process rule set by another rule:

CLASS=linux
    PROC     "%^/sbin/rsyslogd -m 0$"
    PROC     "%^[0-9]+ /sbin/rsyslogd -m 0$"

After doing so, indeed the first rule still fails while the second rule
matches. So apparently the last digit of the VSZ field of rsyslogd(8)
sneaked into the CMD field and gets matched by the PROC check. Is this a
known bug, and if yes is there a good workaround for that apart from
invoking a wrapper script in xymonclient-linux.sh which mangels the output
of ps(1) accordingly?

Thanks in advance!
-cs




More information about the Xymon mailing list