[Xymon] Constant red condition

Jeremy Laidman jlaidman at rebel-it.com.au
Tue Aug 9 04:16:19 CEST 2016


John

On 9 August 2016 at 04:53, john boris <jborissr at gmail.com> wrote:

> I have a constant RED condition for the procs test. I checked all of my
> other servers and they all have just one instance of cron yet this one
> server has the one instance running but it come up red. These are all the
> same OS, Same client version. Some are stand alone servers and others are
> VM but this one machine just will not go Green.
>
> Any pointers on this?
>

Is the "cron" message showing too many or too few cron processes?

Can you please show the message from the "procs" page that says "Processes
NOT ok" including the list of monitored processes (with green/red dots)
below that?  If you are able, show the whole "procs" page.  Also, show the
relevant section from analysis.conf.

What OS are you using?

One thing about the "procs" page is that you can sometimes have unexpected
PROC string matches.  For instance, if you have the line:

        PROC cron

then this definition will match any of these lines from the "ps" listing:

  PID  PPID USER      STARTED S PRI %CPU     TIME %MEM   RSZ    VSZ CMD
  918     1 root       Feb 22 S  22  0.0 00:00:10  0.0   776   2772
/usr/sbin/cron
 2482     1 root       Feb 22 S  22  0.0 00:00:00  0.0  2142  11268
/usr/bin/crontab -e
 5514     1 root       Feb 22 S  22  0.0 00:00:00  0.0 16484  30346 vi
/etc/crontab

and you end up with more "cron" processes reported than you actually have
running.

To counter this, you can use the full path of the binary in analysis.cfg,
like so:

        PROC /usr/sbin/cron

However, this can also match a process like "/usr/sbin/cronlog-parser" or
"less /usr/sbin/cron" or even "cp /usr/sbin/cron /path/to/slow/nfsmount".

To only match where the processes name starts with the string you care
about, change from a string match to a regex match like so:

        PROC %^/usr/sbin/cron$

If some of your process instances might have arguments, and you can take
that into account:

        PROC %^/usr/sbin/snmp($|\s)

Another thing to note about the PROC monitoring is that it simply matches
the strings in the "ps" output, with some assumptions about the column
widths.  If you happen to have column before CMD that is wider than
expected, then it can push other text into the CMD column.  For instance:

  PID  PPID USER      STARTED S PRI %CPU     TIME %MEM   RSZ    VSZ CMD
20445 20442 root       Mar 20 S  23  0.0 00:00:00  0.0  4112  48044 bin/abcd
30269 11056 roddy    11:38:27 S  23  0.0 00:00:00  0.0  3288  46732 sshd
30442 30269 freddy2496 14:41:17 S  23  0.0 00:00:00  0.0  2320  46864 sshd
30460 30442 wendy    11:38:27 S  23  0.0 00:00:00  0.0  2524  14192 -bash

If I'm trying to match "sshd" then the above will match only once, because
the second sshd line will have "sshd" matched against "4 sshd" (the "4"
from the VSZ column) all because the username is wider than the expected
maximum width for the USER field.

Cheers
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160809/e269ba9d/attachment.html>


More information about the Xymon mailing list