[Xymon] Analysis proc match problem and a feature request

Jeremy Laidman jlaidman at rebel-it.com.au
Mon Sep 4 06:55:20 CEST 2017


Hi all

I'm having some difficulty with a process counting/alerting problem for a
Linux client (Xymon client v4.3.27). I know the cause, but can't find a
good solution, short of hacking the code, or requesting a feature
enhancement.

The "ps" command produces output that shows child processes in a kind-of
tree structure ("forest" mode). So I sometimes see this (with some columns
removed for brevity):

 3636    1 root   21720 xinetd -stayalive
29956 3636 nobody 21720  \_ xinetd -stayalive

For xinetd, usually there's only the "master" process. Occasionally there
is one extra like the above, and sometimes more than this. These are simply
child xinetd processes that haven't yet performed an exec of the real
worker process, so nothing unusual.

>From my perspective, the correct number of "master" xinetd processes is 1.
We don't want to have a second master instance of xinetd running. However,
if I add a PROC entry to analysis.cfg for xinetd, alerting when not 1, any
child processes are counted, and this causes an alert that I don't want.

This should be quite easy to solve. I just need to use a regular expression
that only matches when "xinetd" is at the start of the command string. So I
setup analysis.cfg as follows:

HOST=*
  PROC "%^xinetd" 1 1 red

Awesomely, this worked!

But only when the Xymon server was v4.3.10. It seems that in v4.3.23, when
the "forest" option for "ps" was introduced in the client scripts, so was
some code to strip out the leading "\_" characters before matching. This
means that by the time the counting routine is reached, all of these lines
from "ps" look identical.

My work-around is to simply increase the "maximumcount" from 1 to 10 or
some arbitrary number (perhaps based on some historical data). But it's not
ideal. I'd like to be able to tell when a process is a child or a parent
(PPID=1) process and select for those when counting and thresholding. I
can't think of any other way to solve this without reverting xymonclient to
pre-v4.3.23.

I thought about this for a while, pondering the idea of adding a PPID=1
option, or some other way of specifying other fields from the "ps" output.
Then I realised that it might generally be very helpful to be able to
optionally match against the full line from the "ps" output, and not just
the "CMD" field or some other pre-defined field. For example, if I could
append a FIELD=all parameter to the PROC line in analysis.cfg, which allows
me to match anywhere along the "ps" output line, I could not only
match/exclude the "forest" characters, but also the username or the PPID
number or the time started or anything else that's interesting to me. The
fact that the counting routine add_count() is the same for proc, disk and
inode, means that I could do similar things there - such as using the
"IGNORE" keyword for any "df" output lines that started with
"/dev/myspecialfs". The same type of feature for PORT could allow me to
match UDP ports only, whereas I don't believe this is currently possible.

What are peoples' thoughts on this idea? Is there a more obvious solution I
haven't thought of yet?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20170904/6eaec07c/attachment.html>


More information about the Xymon mailing list