[Xymon] xymon on a large architecture

Olivier Audry leo at minas.fr
Wed Dec 17 18:12:57 CET 2014


hello

what kind of hardware do you use ? 300msg/s is not that much. One of my cluster is monitoring 12 000 devices but only network test.  I will check one with 3000 devices to see.

oau

<div>-------- Message d'origine --------</div><div>De : fmaillard.ext at orange.com </div><div>Date :17/12/2014  17:04  (GMT+01:00) </div><div>À : xymon at xymon.com </div><div>Objet : [Xymon] xymon on a large architecture </div><div>
</div>Hello,
 
We’re running a quite large xymon setup, and have been dealing with performance issue for quite a while. Here are some stats to give an idea about the setup:
- We have 2 xymon servers per datacenter, on 3 datacenter (all messages are sent to both servers for a given site)
- Each xymon server receives on average between 200msg/s and 250msg/s. We’re getting peaks at 400msg/sec.
- Each site hosts about 3000 hosts / 30 000 services
 
We’ve been suspecting for a long time that we might be losing messages… and I think I finally tracked it down to xymond not fetching the messages quickly enough so that the kernel’s buffer fill up and messages get discarded (by the kernel). On one of our servers, even though I have already increased net.ipv4.tcp_rmem and net.ipv4.tcp_wmem I got the following output from “netstat -s”:
148909 packets pruned from receive queue because of socket buffer overrun
4453143 packets collapsed in receive queue due to low socket buffer
 
And here I come to the question I’m having:
1/ Is 250msg/s too much for a single xymond instance? Is anyone running instances with a lot more traffic than that?
2/ I’m starting to look into running several instances of xymond on the same machine, by binding them to different ports. Another option is to set up new machines, but that would mean migrating history files (several million files), sorting out the firewalling issues (our xymon interfaces are deeply connected to our information system) so I’d rather like avoiding this option. Are there any guidelines on how to do this?
3/ Are there any settings and best practice that could improve performance? For instance, should we move to a massive use of combo statuses in order to lessen the number of messages received?
 
Best regards,
 
Francois Maillard
 
Pilote des plateformes Supervision, DNS & FTP - Sysadmin Infrastructure
Altran Méditerranée
pour Orange/OF/DTSI/DSI/DFY/HBX
Sophia Antipolis
tél. 04 97 12 87 53
fmaillard.ext at orange.com
 
_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141217/ec7a9765/attachment.html>


More information about the Xymon mailing list