[Xymon] Xymon PROC check fails on un-aligned ps(1) output
Christoph Schug
cs at schug.net
Mon Aug 1 20:32:10 CEST 2011
If have got a question regarding Xymon 4.3.3 (running on CentOS
5.6/x86_64). In order to monitor the existence of certain processes like
rsyslogd(8) I have following process rule defined in analysis.cfg:
CLASS=linux
PROC "%^/sbin/rsyslogd -m 0$"
This works fine as long as the columns in the output of ps(1) (more
specific “ps -Aww -o
pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd” as defined in
xymonclient-linux.sh) are all nicely aligned.
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD
[...]
4620 4607 68 Jun 17 S 22 0.0 00:00:00 0.0 860 12348
hald-addon-keyboard: listening on /dev/input/event0
4709 1 root Jun 17 S 17 0.0 00:00:00 0.0 496 8540
/usr/bin/hidd --server
4739 1 root Jun 17 S 21 0.0 00:11:14 0.0 3576 300132
/sbin/rsyslogd -m 0
6894 1 root Jun 17 S 18 0.0 00:00:00 0.0 1540 122008
automount
6918 1 root Jun 17 S 24 0.0 00:00:08 0.0 1224 63544
/usr/sbin/sshd
The trouble starts when the process in question runs long enough (as seen
on a different machine) so it does fit the reserved columns for that
specific field, disturbing to whole output (process runtime is just one
example, I suppose any value growing big enough to not fit anymore the
reserved space would do to exploit this behavior):
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD
[...]
5377 1 root May 24 S 21 0.0 00:00:00 0.0 444 3816
/sbin/mingetty tty4
5378 1 root May 24 S 20 0.0 00:00:00 0.0 444 3816
/sbin/mingetty tty5
5380 1 root May 24 S 19 0.0 00:00:00 0.0 444 3816
/sbin/mingetty tty6
5382 1 root May 24 S 22 0.0 00:00:00 0.0 496 3824
/sbin/agetty 9600 ttyS1 vt100
8734 1 root Jun 20 S 21 7.7 3-06:51:29 0.1 48640 292664
/sbin/rsyslogd -m 0
20468 262 root Jul 19 S 24 0.0 00:04:01 0.0 0 0
[pdflush]
In this case the above regex does not seem to match anymore, because
(apparently) the matching starts at some fixed column value. Just for fun
and to double check I enhanced the process rule set by another rule:
CLASS=linux
PROC "%^/sbin/rsyslogd -m 0$"
PROC "%^[0-9]+ /sbin/rsyslogd -m 0$"
After doing so, indeed the first rule still fails while the second rule
matches. So apparently the last digit of the VSZ field of rsyslogd(8)
sneaked into the CMD field and gets matched by the PROC check. Is this a
known bug, and if yes is there a good workaround for that apart from
invoking a wrapper script in xymonclient-linux.sh which mangels the output
of ps(1) accordingly?
Thanks in advance!
-cs
More information about the Xymon
mailing list