[Xymon] Looking to monitor ~20,000 hosts.
J.C. Cleaver
cleaver at terabithia.org
Thu May 15 17:58:07 CEST 2014
On Thu, May 15, 2014 7:26 am, Weber, Matt wrote:
> Hi all,
>
> We are attempting to setup Xymon at our organization, where it would be
> monitoring approximately 20,000 hosts with the Xymon client in central
> mode (pulldata). Just wondering if anyone has Xymon monitoring anywhere
> close to that number of machines? We are looking for ideas on what
> hardware specs would be required for the machine running the server side
> of the Xymon software, or other suggestions on how to setup the
> environment. Is there a way to load balance multiple Xymon servers?
>
> Thanks,
> Matt
>
We're checking a fairly large number of "things" in Xymon, but many are
not actual servers (although they do receive client messages).
We've never used the xymonfetch utility at that scale (although it was one
of the options we considered as we grew out), but that might be an area to
look into as I'm not sure what its parallelization capabilities are.
We've made heavy use of the "backfeed queue" that Henrik introduced in
4.3.13 to help keep the core xymond efficient at processing messages.
We're also using a --bfq option to xymonproxy to allow it to receive
incoming one-way TCP messages from various systems, close the connection,
and drop them onto said BFQ queue. This frees xymond up for simply a)
handling two-way messages and requests for data, and b) channel
management.
Currently we have ~285000 host+svc combinations (ignoring info/trends) and
~75000 logical "hosts" (though most of those aren't servers per se). We
process about 2800 msgs/s on a 32-way box with a load average of about 7.
Lots of RAM. We've moved our pollers out off the main server, but that was
mainly to remove long-run network breaks causing lots of alerts more than
any efficiency need. If not for that, everything would be running on a
pair of redundant boxes.
We're using the most recent RPMs at
http://terabithia.org/rpms/xymon/testing/el6/ in production at this time.
HTH,
-jc
More information about the Xymon
mailing list