[Xymon] xymon on a large architecture

Gautier Begin gbegin at csc.com
Mon Dec 22 14:09:40 CET 2014


Hello,

Two little questions on big environments splitted with xymon proxies (to 
balance the flow of data to the xymon server):

-  How many clients do you advise per xymonproxy ?
-  Do you advise a different configuration on the xymonproxy according the 
type of data collected (xymon agents, bbwin agents, status messages, data 
messages etc.)


Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN




From:   <fmaillard.ext at orange.com>
To:     "xymon at xymon.com" <xymon at xymon.com>
Date:   12/18/2014 06:24 PM
Subject:        Re: [Xymon] xymon on a large architecture
Sent by:        "Xymon" <xymon-bounces at xymon.com>



Thanks for the answer, I’m happily surprised to see that we’re still far 
from xymon’s limit.
 
I made more checks today, and realized that some tools that I thought were 
sending their statuses over a few seconds actually send batches of ~200 
statuses in 200ms. These burst are definitely far higher than 200msg/s… I 
had no idea that our traffic was so irregular.
I guess that, as you pointed out, the next step is to setup xymonproxy and 
have the few tools that generate so much traffic use them. As the only 
thing they know about xymon is the status command, that should be quite 
easy to set up.
 
Regarding the history files, it is not an issue as far as performance is 
concerned, but it definitely is a pain to manage. Moving to database 
solution has been in our todo list for some time now.
 
Regards,
 
Francois Maillard 
 
De : Clark, Sean [mailto:sean.clark at twcable.com] 
Envoyé : mercredi 17 décembre 2014 19:46
À : MAILLARD Francois Ext DTSI/DSI; xymon at xymon.com
Objet : Re: [Xymon] xymon on a large architecture
 
So I have a setup where, in total, I monitor about 300,000 hosts.
 
It's regionalized though,  with the largest site having roughly 34,000 
hosts
 
The xymon daemon runs on a VM, with 16 GB of Ram, and 4x2.5 Ghz xeon 
processors. The virtual NIC is a 1000Mb 
 
It currently handles 481 msgs/sec (the hosts.cfg is 1.2 Mb)
 
Here are some ideas:
 
 
(1) Obviously split pollers out into separate machines
(2) Change the history daemon (Henrick as some sample daemons in the 
source code) - I changed it to write stachg and clichg and such are 
written out to mysql instead of files, and the textblog is compressed. 
This helped with managing history
(3) Use JC Cleaver's patches http://terabithia.org/rpms/xymon/  -- he has 
several memory leak fixes, tuning optimizations. I would just say that if 
you use clientupdate, you will have to patch his things a little bit
(4) An interesting thing you can do, if you have the means via a load 
balancer is -->
    (a) I have 10 pollers for this instance. All of them run xymonproxy on 
localhost (stuff they poll reports to localhost and gets forwarded up)
    (b) Using a load balancer for they xymon IP address, it inspects the 
first 12 characters of a xymon message – if it's something that can be 
combo'd in proxy it passes to a poller in the pool, otherwise, it sends it 
to the real xymon IP
(5) JC has some nanomsg options for channels, and myself, I use AMQP for 
channels to communicate data outbound. Both of these help if you use 
xymondata outside of the stock app
 
For your setup, rather than running more xymond's on different ports, 
setup xymonproxy's on different machines, and point your clients to them. 
Then you won't have to migrate your history files or anything else. But I 
would look at changing your history setup, as several million files gets 
very unwieldy to manage, but millions of rows in databases are much easer 
to do.
 
 
 
 
 
 
 
 
From: "fmaillard.ext at orange.com" <fmaillard.ext at orange.com>
Date: Wednesday, December 17, 2014 at 11:04 AM
To: "xymon at xymon.com" <xymon at xymon.com>
Subject: [Xymon] xymon on a large architecture
 
Hello,
 
We’re running a quite large xymon setup, and have been dealing with 
performance issue for quite a while. Here are some stats to give an idea 
about the setup:
- We have 2 xymon servers per datacenter, on 3 datacenter (all messages 
are sent to both servers for a given site)
- Each xymon server receives on average between 200msg/s and 250msg/s. 
We’re getting peaks at 400msg/sec.
- Each site hosts about 3000 hosts / 30 000 services
 
We’ve been suspecting for a long time that we might be losing messages… 
and I think I finally tracked it down to xymond not fetching the messages 
quickly enough so that the kernel’s buffer fill up and messages get 
discarded (by the kernel). On one of our servers, even though I have 
already increased net.ipv4.tcp_rmem and net.ipv4.tcp_wmem I got the 
following output from “netstat -s”:
148909 packets pruned from receive queue because of socket buffer overrun
4453143 packets collapsed in receive queue due to low socket buffer
 
And here I come to the question I’m having:
1/ Is 250msg/s too much for a single xymond instance? Is anyone running 
instances with a lot more traffic than that?
2/ I’m starting to look into running several instances of xymond on the 
same machine, by binding them to different ports. Another option is to set 
up new machines, but that would mean migrating history files (several 
million files), sorting out the firewalling issues (our xymon interfaces 
are deeply connected to our information system) so I’d rather like 
avoiding this option. Are there any guidelines on how to do this?
3/ Are there any settings and best practice that could improve 
performance? For instance, should we move to a massive use of combo 
statuses in order to lessen the number of messages received?
 
Best regards,
 
Francois Maillard
 
Pilote des plateformes Supervision, DNS & FTP - Sysadmin Infrastructure
Altran Méditerranée
pour Orange/OF/DTSI/DSI/DFY/HBX
Sophia Antipolis
tél. 04 97 12 87 53
fmaillard.ext at orange.com
 
_________________________________________________________________________________________________________________________
 
Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez 
recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.
 
This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and 
delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.
 

This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely 
for the use of the individual or entity to which it is addressed. If you 
are not the intended recipient of this E-mail, you are hereby notified 
that any dissemination, distribution, copying, or action taken in relation 
to the contents of and attachments to this E-mail is strictly prohibited 
and may be unlawful. If you have received this E-mail in error, please 
notify the sender immediately and permanently delete the original and any 
copy of this E-mail and any printout.
_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez 
recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and 
delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.
_______________________________________________
Xymon mailing list
Xymon at xymon.com
http://lists.xymon.com/mailman/listinfo/xymon


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141222/f3fcd66f/attachment.html>


More information about the Xymon mailing list