Trying to set up split client load over mutiple servers.
Brand, Thomas R.
TRBrand at cvs.com
Thu May 27 23:32:23 CEST 2010
Sorry if this post gets a bit long, I've read thru the man pages and the
archives and have failed to reach understanding and need some help.
I have over 4300 Hobbit 4.2.0+all-patches clients currently reporting to
one server running Xymon 4.3.0-0.beta2. This server is also used for
other workloads, which normally do not produce much of a load but can at
times.
We are adding more clients to this configuration at rate of about
400/week for a final total of about 7200 clients.
As of this week it seems that my Xymon server is unable to keep up with
the load and I suspect this is due to several reasons: the large number
of clients, file system type (ext3) used for the data/.. directories (,
the problem is.
I am trying to determine my best course to be able to handle the 7200+
clients ...
Some options I've considered:
A) simply split the clients over 2 or more independent Xymon servers
a. easiest to configure
b. lose 'single web page' overview
c. lose combined statistics/reporting
B) Split clients over multiple servers for data gathering/storage
and (how?) use one server to display a bb2.html page which combines all
non green from all 'data-gathering' servers
a. Is this even possible?
C) Some other method/configuration? Anyone on the list running this
many clients?
a. How did you set up your environment?
b. What size/performance/type of system are you using for your
Xymon server?
Some of my current symptoms:
The 'top' command on the server is indicating high I/O load (at times
%iowait > 60%).
The 'procs' column warns/alerts about missing processes; closer
examination shows that the client data received has been truncated,
usually somewhere in the 'ps' outout section.
The bb2.html page sometimes shows many 'purple' clients; the clients in
purple change (eg, a client goes purple for a while 5-30+ minutes, then
we get another message processed and the client goes green again.
In /var/log/xymon/clientdata.log, I am seeing many (2212 yesterday
alone) messages like:
2010-05-27 11:38:09 hobbitd_client: Got message 55294, expected 55277
2010-05-27 12:08:12 Flushed 7 stale messages for 0.0.0.0:0
2010-05-27 12:08:13 Flushed 43 stale messages for 0.0.0.0:0
2010-05-27 12:08:13 hobbitd_client: Got message 81190, expected 81140
2010-05-27 12:38:36 Flushed 16 stale messages for 0.0.0.0:0
2010-05-27 12:38:37 Flushed 43 stale messages for 0.0.0.0:0
2010-05-27 12:38:38 Flushed 26 stale messages for 0.0.0.0:0
2010-05-27 12:38:38 hobbitd_client: Got message 107484, expected 107399
The hobbitd page shows:
Statistics for Hobbit daemon
Up since 26-May-2010 14:55:38 (0 days, 22:00:02)
Incoming messages : 16721697
- status : 11146704
- combo : 1122112
Incoming messages/sec : 216 (average last 300 seconds)
The bbtest is taking 225 seconds to complete
PING test completed (4390 hosts) 5005592.360761
203.397989
TIME TOTAL
225.391564
Hoping the group-mind can help me out,
Thanks,
Tom
Thomas Brand
Disclaimer: 1) all opinions are my own, 2) I may be completely wrong, 3)
my advice is worth at least as much as what you are paying for it, or
your money cheerfully refunded.
CONFIDENTIALITY NOTICE: This communication and any attachments may
contain confidential and/or privileged information for the use of the
designated recipients named above. If you are not the intended
recipient, you are hereby notified that you have received this
communication in error and that any review, disclosure, dissemination,
distribution or copying of it or its contents is prohibited. If you
have received this communication in error, please notify the sender
immediately by telephone and destroy all copies of this communication
and any attachments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20100527/c35dcad9/attachment.html>
More information about the Xymon
mailing list