[Xymon] XyMon System Requirements

Henrik Størner henrik at hswn.dk
Thu Dec 22 09:23:03 CET 2011


Hi Matthew,

On 22-12-2011 00:27, Matthew Neumark wrote:

> Currently I'm managing a XyMon server which consists of around 5,000
> devices. We are looking to keep continually adding more and more devices
> to it as time goes on. The issue is our system is currently always using
> max system resources we keep allocating to the server. BTW devmon seems
> to be the highest system resource hog.
> Stats:
> About 5,000 devices
> XyMon 4.3.0-0-beta2
> DevMon 0.2
> 4 CPU(s)
> 16 GB RAM
> Suse Linux Enterprise 10 (32-Bit)
> 300GB Enterprise SAN Storage - Fiber Channel - (3 Years of archived data
> stored)
> Do it do me any good to give the system more resources? CPU(s) or RAM?
> What is the experience that other users have with monitoring this many
> devices?
> What system configurations are you using to support this many monitored
> hosts?

Your installation is about the same size as the one I have at work. I 
recently upgraded it because it could no longer keep up with the load, 
and based on that I would say that your hardware specs should be more 
than adequate to handle the number of devices.


The only real difference between your system and mine is that I changed 
to SSD disks for storing the RRD-files (graphs) - I don't know how your 
FC disks compare with SSD's, but I could certainly see a significant 
effect of that change; when stopping Xymon it used to take 15-20 minutes 
for xymond_rrd to flush all of the cached RRD updates to disk, but after 
changing to the SSD disks it only takes a few seconds. The interesting 
thing of course is how long they will last, since the number of write 
operations is limited on these devices; I plan to replace them once a 
year to be on the safe side.

Have a look at your vmstat1 graph for the Xymon server (on the "trends" 
status page), and see how much time is being spent in I/O wait state - 
if that is in the 20-25% range, then you probably have a problem with 
I/O bandwidth, and adding an SSD disk could help. (I say 20-25% because 
as far as I know, Linux sends all I/O operations through one CPU, so if 
you have 4 CPU's and one of them is fully busy doing I/O, then it will 
show up in vmstat as I/O wait taking up 1/4 if the time).


Is there any swap being used ? (Check the "free" output). I wouldn't 
expect that there is much swap going on with 16 GB of RAM. So more RAM 
probably will not help.


I've never used Devmon, so I don't know how much of a "hog" it is. If it 
really is the one using all of the ressources, a solution (or 
workaround, really) might be to split the Devmon load between more 
servers - you can still have them report their data to the same Xymon 
server, you will only move the running of Devmon to a different node.


Just for the record, my current system is an HP DL380 G7, 2 dual-core 
2.4 GHz CPU's, 24 GB RAM, 6x300 GB SAS 10K diske in a RAID-1 
configuration, and 2 64 GB SSD disks in RAID-1. It is currently handling 
about 4200 servers with clients installed, and an additional 3000 
entries in hosts.cfg for network devices, websites etc. All in all 50000 
statuses being tracked. On average, the CPU load is 6% busy. To be fair, 
I must add that I have most of the network tests running on another node 
(for ease of firewall setup, mostly) and that node is 15% busy.


Regards,
Henrik



More information about the Xymon mailing list