[hobbit] Trying to set up split client load over mutiple servers.

Josh Luthman josh at imaginenetworksllc.com
Thu May 27 23:45:00 CEST 2010


What kind of hardware are you on now?

On 5/27/10, Brand, Thomas R. <TRBrand at cvs.com> wrote:
> Sorry if this post gets a bit long, I've read thru the man pages and the
> archives and have failed to reach understanding and need some help.
>
>
>
> I have over 4300 Hobbit 4.2.0+all-patches clients currently reporting to
> one server running Xymon  4.3.0-0.beta2. This server is also used for
> other workloads, which normally do not produce much of a load but can at
> times.
>
>
>
> We are adding more clients to this configuration at rate of about
> 400/week for a final total of about 7200 clients.
>
>
>
> As of this week it seems that my Xymon server is unable to keep up with
> the load and I suspect this is due to several reasons: the large number
> of clients, file system type (ext3) used for the data/.. directories (,
> the problem is.
>
>
>
> I am trying to determine my best course to be able to handle the 7200+
> clients ...
>
> Some options I've considered:
>
> A)     simply split the clients over 2 or more independent Xymon servers
>
> a.      easiest to configure
>
> b.      lose 'single web page' overview
>
> c.      lose combined statistics/reporting
>
>
>
> B)     Split clients over multiple servers for data gathering/storage
> and (how?) use one server to display a bb2.html page which combines all
> non green from all 'data-gathering' servers
>
> a.      Is this even possible?
>
>
>
> C)    Some other method/configuration? Anyone on the list running this
> many clients?
>
> a.      How did you set up your environment?
>
> b.      What size/performance/type of system are you using for your
> Xymon server?
>
>
>
>
>
>
>
>
>
> Some of my current symptoms:
>
>
>
> The 'top' command on the server is indicating high I/O load  (at times
> %iowait > 60%).
>
>
>
> The 'procs' column warns/alerts about missing processes; closer
> examination shows that the client data received has been truncated,
> usually somewhere in the 'ps' outout section.
>
>
>
> The bb2.html page sometimes shows many 'purple' clients; the clients in
> purple change (eg, a client goes purple for a while 5-30+ minutes, then
> we get another message processed and the client goes green again.
>
>
>
> In /var/log/xymon/clientdata.log, I am seeing many (2212 yesterday
> alone) messages like:
>
> 2010-05-27 11:38:09 hobbitd_client: Got message 55294, expected 55277
>
> 2010-05-27 12:08:12 Flushed 7 stale messages for 0.0.0.0:0
>
> 2010-05-27 12:08:13 Flushed 43 stale messages for 0.0.0.0:0
>
> 2010-05-27 12:08:13 hobbitd_client: Got message 81190, expected 81140
>
> 2010-05-27 12:38:36 Flushed 16 stale messages for 0.0.0.0:0
>
> 2010-05-27 12:38:37 Flushed 43 stale messages for 0.0.0.0:0
>
> 2010-05-27 12:38:38 Flushed 26 stale messages for 0.0.0.0:0
>
> 2010-05-27 12:38:38 hobbitd_client: Got message 107484, expected 107399
>
>
>
> The hobbitd page shows:
>
> Statistics for Hobbit daemon
> Up since 26-May-2010 14:55:38 (0 days, 22:00:02)
>
> Incoming messages      :   16721697
> - status               :   11146704
> - combo                :    1122112
>
>
>
> Incoming messages/sec  :        216 (average last 300 seconds)
>
>
>
> The bbtest is taking 225 seconds to complete
>
> PING test completed (4390 hosts)            5005592.360761
> 203.397989
> TIME TOTAL
> 225.391564
>
>
>
>
>
>
>
> Hoping the group-mind can help me out,
>
> Thanks,
>
> Tom
>
>
>
> Thomas Brand
>
> Disclaimer: 1) all opinions are my own, 2) I may be completely wrong, 3)
> my advice is worth at least as much as what you are paying for it, or
> your money cheerfully refunded.
>
> CONFIDENTIALITY NOTICE: This communication and any attachments may
> contain confidential and/or privileged information for the use of the
> designated recipients named above.  If you are not the intended
> recipient, you are hereby notified that you have received this
> communication in error and that any review, disclosure, dissemination,
> distribution or copying of it or its contents is prohibited.  If you
> have received this communication in error, please notify the sender
> immediately by telephone and destroy all copies of this communication
> and any attachments.
>
>
>
>


-- 
Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

“Success is not final, failure is not fatal: it is the courage to
continue that counts.”
--- Winston Churchill



More information about the Xymon mailing list