[Xymon] Bug? procs test going red green every hour

John Horne john.horne at plymouth.ac.uk
Thu Mar 21 16:38:45 CET 2024


Hello,

We are using Xymon 4.3.30 from the Terabithia rpms on Linux servers.
Sorry the message is a bit long; necessary in order to explain what is going
on.

We have recently noticed that the 'procs' test goes red every hour, and then 5
mins later changes to green. This happens just after the hour, and has only
been seen on 4 servers. The procs test indicates a process (either 'clamscan'
or 'freshclam') of the ClamAV package is not present.

Looking into this I can see that the Xymon 'hostdata' output uses the 'ps'
command to get the current processes. The actual 'ps' command is defined in
'xymonclient-linux.sh' as:
ps -Aww f -o pid,ppid,user,start,state,pri,pcpu,time:12,pmem,rsz:10,vsz:10,cmd

However, looking at the actual hostdata file for one of the servers shows:

=====
1197212       1 root       Mar 10 S  19  0.0     00:00:06  0.0       3660
6072 /usr/sbin/crond -n
2822567 1197212 root     15:01:00 S  19  0.0     00:00:00  0.0       5168
12436  \_ /usr/sbin/CROND -n
2822568 2822567 root     15:01:00 S  19  0.0     00:00:00  0.0       3464
7124      \_ /usr/bin/bash /bin/run-parts /etc/cron.hourly
2822580 2822568 root     15:01:00 S  19  0.0     00:00:00  0.0       3388
7124          \_ /bin/bash /etc/cron.hourly/security.cron
2822581 2822568 root     15:01:00 S  19  0.0     00:00:00  0.0       1104
4016          \_ sed 1i\ /etc/cron.hourly/security.cron: 611279       1
clamscan   Mar 14 S  19  0.0     00:03:23  8.6    1389464    1573272 /usr/sbin/
clamd -c /etc/clamd.d/scan.conf
=====

('security.cron' is just an in-house script.) As can be seen 'crond/CROND' runs
'run-parts', which in turn runs the 'cron.hourly' jobs (security.cron).
However, that last line is corrupted - it is running a 'sed' command, but has
the following 'ps' output command appended to it (the one with
'/usr/sbin/clamd').
Hence, the 'procs' test sees the 'sed' command (we don't monitor for that), but
misses the '/usr/sbin/clamd' command (which we do monitor) because it is all on
one line. So the 'procs' test goes red. Five mins later, when the various jobs
have ended, the clamd command is again seen within the 'ps' output because
'run-parts' is not running.

So, looking at 'run-parts' (it's a shell script) shows that how it actually
runs the cron jobs is by:

=====
# run executable files
logger -p cron.notice -t "run-parts[$$]" "($1) starting $(basename $i)"
$i 2>&1 | sed '1i\
'"$i"':\
'
logger -p cron.notice -t "run-parts[$$]" "($1) finished $(basename $i)"
=====

This where the 'sed' command comes from, and as can be seen it outputs newlines
using the escape character.

I have tested by changing the 'sed' command into just one line by replacing the
newlines with "\n". This works fine, and the 'procs' test remains green.
I also noticed that on CentOS 7 servers 'run-parts' uses 'awk' rather than
'sed'. So this is why the problem isn't seen on those servers - it is only seen
on Rocky 8 and 9 Linux servers.
I have also stopped the clamd process in order that some other process follows
the run-parts one. The hostdata file again shows the new process as being
appended to the sed command. If we are monitoring that new process, then again
procs goes red.

I need to do more testing, but am a little lost as to whether the bug (if it
exists) is in the 'ps' output, the way it is recorded in the hostdata file or
in the processing of the 'procs' test.



John.

--
John Horne | Senior Operations Analyst | Technology and Information Services
University of Plymouth | Drake Circus | Plymouth | Devon | PL4 8AA | UK
________________________________
[https://www.plymouth.ac.uk/images/email_footer.gif]<http://www.plymouth.ac.uk/worldclass>

This email and any files with it are confidential and intended solely for the use of the recipient to whom it is addressed. If you are not the intended recipient then copying, distribution or other use of the information contained is strictly prohibited and you should not rely on it. If you have received this email in error please let the sender know immediately and delete it from your system(s). Internet emails are not necessarily secure. While we take every care, University of Plymouth accepts no responsibility for viruses and it is your responsibility to scan emails and their attachments. University of Plymouth does not accept responsibility for any changes made after it was sent. Nothing in this email or its attachments constitutes an order for goods or services unless accompanied by an official order form.


More information about the Xymon mailing list