[Xymon] Plugin architecture (was Re: Xymon, community, updates, and directions)
J.C. Cleaver
cleaver at terabithia.org
Sat Aug 26 12:42:16 CEST 2023
On Thu, August 17, 2023 16:44, Jeremy Laidman wrote:
> On Fri, 18 Aug 2023 at 04:27, J.C. Cleaver <cleaver at terabithia.org> wrote:
>
>>
>> On Mon, August 14, 2023 21:10, Berry van Sleeuwen wrote:
>> > We recently migrated our server to SLES15 SP4 and found there are a
>> few
>> > network tools missing in the base, arp, netstat, ifconfig and route
>> are
>> > supposed to be replaced by "ip" and "ss" command. While similar the
>> output
>> > of these commands differs from the traditional tools so I guess that
>> would
>> > interfere with the processing of the command output. For now it's
>> solved
>> > by installing net-tools-deprecated but this might not be available in
>> > future versions so we might need support for these commands. I don't
>> now
>> > if that is also the direction for other distributions, but it's at
>> least
>> > the case for Suse and OpenSuse.
>>
>> Agreed; it's similar in the RHEL side. xymond_client updates that can
>> interpret the output of ip and ss are probably called for now. While
>> deprecated net-tools will stick around for sure for current systems,
>> it's
>> only a matter of time until they're removed, and in the meantime it's
>> one
>> less package to have to pull in for compatibility (and to explain).
>>
>> This probably goes higher on the list.
>>
>
> Two brief comments about this.
>
> To me, xymond_client always seemed to be a good candidate for making it a
> bit more modular. The code is written in a modular way, to make it fairly
> easy to add new ways of collecting the same metrics from different OSes
> and
> OS versions (which would presumably make it possible for even me to add
> support for "ip" and "ss", albeit with bloated code and buffer overruns).
> But I thought it would be neat to be able to plug in some kind of run-time
> process to handle new scenarios - bit it a dynamic library, or a shared
> memory protocol or a worker module launched by xymond_channel or similar
> and written in whatever language was available and familiar to the
> sysadmin
> at the time.
This would definitely be a useful feature, with the most performant option
I assume being a dynamic library module framework of some sort for common
cases. I'm afraid designing this type of module system is a bit out of my
baliwick.
Then again, many (I suspect most) xymon users don't have the intense
performance needs at the 1000s/msgs/s range. I'd prefer not to encourage
*adding* forking on a per message basis, so a socket method would be on
the table.
Then again, xymond_channel basically *is* a socket and I suspect the
impact of multiple 'client' channel listeners again is only going to
become apparent at massive scale as well. This was the impetus for the
--meta(ex)filter patches (https://sourceforge.net/p/xymon/code/7868/), as
well as --multilocal (https://sourceforge.net/p/xymon/code/7811/) and a
lot of other tuning (https://sourceforge.net/p/xymon/code/7813/). We had a
*lot* of non-*nix "client"data coming through and we wanted to make sure
we were processing as little unnecessary data as possible so we could
quickly move onto the next one.
One can also fork off to (the rather unfortunately named) "pee" utility
from moreutils
(https://www.putorius.net/linux-pee-command-tee-standard-input-into-pipes.html)
and let multiple pipes read from STDIN, as long as they all can perform
efficiently themselves; ultimately this is what we did. We combined all of
our "extra" *nix tests (such as interpreting the output of /proc/mounts
looking for disks that had flipped into ro-mode and generating a status
msg for it) into a single large perl script that simply ran off the same
linux client channel listener that xymond_client did.
Given that most users won't have the kind of performance scale
considerations we did, I wonder if what's needed more here isn't just
*standardization* of add-ons (as mentioned elsewhere in the thread).
Example: A drop-in (.d) location that provides a tasks.cfg snippet
specifying what messages it cares about (or enough info for a
plugin-generator to craft one) and a channel listener script directory
with an executable that only gets the filtered messages it cares about,
and is responsible for reinjecting status messages at its desire.
There would also need to be a facility for easily adding graph
definitions, etc.
Perhaps what would be most helpful would simply be packagization templates
that provide the file drops in the necessary locations directly, and/or a
"plugins" directory that is scanned for relevant snippets a level down
when found (e.g., "plugins/foobar/tasks.d/* ; plugins/foobar/graphs.d/*")
to allow these to be distributed as simple tarballs.
>
> The emergence of "sar" as a universally available system metric reporting
> tool seems to solve this problem in a different way. If xymond_client had
> a
> "sar" module, it could pretty much support any popular modern OS apart
> from
> Windows (so Linux, Solaris, MacOS/*BSD, HPUX, even IRIX). While sar
> provides only a subset of the info obtained from the client script (so it
> couldn't replace ss/netstat or ip/ifconfig) it would reduce the overhead
> of
> having to support a range of different tools for plenty of metrics. It
> would probably standardise the output of metrics so that the parsing cond
> in xymond_client can be much simpler, easier to write and maintain, and
> less likely to have bugs. Generally speaking, parsing is something that
> can
> be difficult to do safely; programs that parse files and data streams are
> notoriously common targets for hackers.
I had to check this as well, as I was sure some [sar] reading had been
added in, but t'was not the case.
One experiment I had was running rotating 30s for 5m sadc collectors in a
similar manner to how vmstat is executed in the client. This is at the
very least helpful for hostdata client snapshots, but needed a processor
on the server side, like you say, to make better use of it.
>
> Any further work on Xymon needs to ensure that it works with up-to-date
> OSes, and anything that can be done to make that easier is likely to help
> the cause.
Agreed 100%.
Touching on the collector/processor distinction though: Xymon has had the
client/local facility for quite a while now, dating back to 4.3.7
(https://sourceforge.net/p/xymon/code/6800/) but I'm not sure how
well-known it is.
In the Terabithia packages (and 4.4:
https://sourceforge.net/p/xymon/code/7755/) there's a parallel "/sections"
directory that works the same way but doesn't pre-pend "local:" to the
section name (intended for site packages vs custom per-box scripting).
-jc
More information about the Xymon
mailing list