[Xymon] xymon on a large architecture
Clark, Sean
sean.clark at twcable.com
Wed Dec 17 19:46:17 CET 2014
So I have a setup where, in total, I monitor about 300,000 hosts.
It's regionalized though, with the largest site having roughly 34,000 hosts
The xymon daemon runs on a VM, with 16 GB of Ram, and 4x2.5 Ghz xeon processors. The virtual NIC is a 1000Mb
It currently handles 481 msgs/sec (the hosts.cfg is 1.2 Mb)
Here are some ideas:
(1) Obviously split pollers out into separate machines
(2) Change the history daemon (Henrick as some sample daemons in the source code) - I changed it to write stachg and clichg and such are written out to mysql instead of files, and the textblog is compressed. This helped with managing history
(3) Use JC Cleaver's patches http://terabithia.org/rpms/xymon/ -- he has several memory leak fixes, tuning optimizations. I would just say that if you use clientupdate, you will have to patch his things a little bit
(4) An interesting thing you can do, if you have the means via a load balancer is -->
(a) I have 10 pollers for this instance. All of them run xymonproxy on localhost (stuff they poll reports to localhost and gets forwarded up)
(b) Using a load balancer for they xymon IP address, it inspects the first 12 characters of a xymon message – if it's something that can be combo'd in proxy it passes to a poller in the pool, otherwise, it sends it to the real xymon IP
(5) JC has some nanomsg options for channels, and myself, I use AMQP for channels to communicate data outbound. Both of these help if you use xymondata outside of the stock app
For your setup, rather than running more xymond's on different ports, setup xymonproxy's on different machines, and point your clients to them. Then you won't have to migrate your history files or anything else. But I would look at changing your history setup, as several million files gets very unwieldy to manage, but millions of rows in databases are much easer to do.
From: "fmaillard.ext at orange.com<mailto:fmaillard.ext at orange.com>" <fmaillard.ext at orange.com<mailto:fmaillard.ext at orange.com>>
Date: Wednesday, December 17, 2014 at 11:04 AM
To: "xymon at xymon.com<mailto:xymon at xymon.com>" <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: [Xymon] xymon on a large architecture
Hello,
We’re running a quite large xymon setup, and have been dealing with performance issue for quite a while. Here are some stats to give an idea about the setup:
- We have 2 xymon servers per datacenter, on 3 datacenter (all messages are sent to both servers for a given site)
- Each xymon server receives on average between 200msg/s and 250msg/s. We’re getting peaks at 400msg/sec.
- Each site hosts about 3000 hosts / 30 000 services
We’ve been suspecting for a long time that we might be losing messages… and I think I finally tracked it down to xymond not fetching the messages quickly enough so that the kernel’s buffer fill up and messages get discarded (by the kernel). On one of our servers, even though I have already increased net.ipv4.tcp_rmem and net.ipv4.tcp_wmem I got the following output from “netstat -s”:
148909 packets pruned from receive queue because of socket buffer overrun
4453143 packets collapsed in receive queue due to low socket buffer
And here I come to the question I’m having:
1/ Is 250msg/s too much for a single xymond instance? Is anyone running instances with a lot more traffic than that?
2/ I’m starting to look into running several instances of xymond on the same machine, by binding them to different ports. Another option is to set up new machines, but that would mean migrating history files (several million files), sorting out the firewalling issues (our xymon interfaces are deeply connected to our information system) so I’d rather like avoiding this option. Are there any guidelines on how to do this?
3/ Are there any settings and best practice that could improve performance? For instance, should we move to a massive use of combo statuses in order to lessen the number of messages received?
Best regards,
Francois Maillard
Pilote des plateformes Supervision, DNS & FTP - Sysadmin Infrastructure
Altran Méditerranée
pour Orange/OF/DTSI/DSI/DFY/HBX
Sophia Antipolis
tél. 04 97 12 87 53
fmaillard.ext at orange.com<mailto:fmaillard.ext at orange.com>
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
________________________________
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141217/15211cd4/attachment.html>
More information about the Xymon
mailing list