[hobbit] Future of Hobbit

Tim Rotunda tim.rotunda at twcable.com
Fri Jan 25 22:21:29 CET 2008


On 1/25/08 1:43 PM, "Charles Jones" <jonescr at cisco.com> wrote:

> I think Henriks stance on having the server collect data via ssh
> connections just doesn't scale.  Sure it works fine for a few dozen
> hosts, but let's say you have 2000 servers...now you are expecting be
> able to make 2000 trouble-free ssh connections before the next polling
> cycle begins. This introduces many problems:
> 
> * How many ssh sessions can you run at the same time without spiking the
> load on the hobbit server?
The latest revision is threaded.  The thread count is a parameter and is
easily changed.  If I remember correctly, we had 11 threads finishing all
the nodes in under 90 seconds.  Also, going to a threaded architecture
brought cpu util from 80% to 5%.  I was astonished.  So we figured you could
have 100's of threads running on the HP rx1600 single socket node we were
using for hobbit.

> * What happens when an ssh session hangs (could hang the hobbit server,
> or make the poll cycle take too long)
Since there were many threads, a hung thread would hardly be noticed if it
weren't for the purple that node would turn.
> 
> You do know about the "pulldata" option?  It allows the Hobbit server to
> do a "pull" instead of waiting for client "push". This works fairly
> well, and I am using it in a production environment. I can see how it
> would not scale to well either though, for a really large number of hosts.
> 
> To picture the scalability, imagine a server that only has to receive
> updates from hobbit clients. All it has to do is listen on port 1984,
> and using relatively little CPU it can probably handle a constant flow
> of client updates.
> 
> Now imagine a server that has to go and fetch the client data itself.
> There is a LOT more overhead and processing involved in launching an
> outgoing ssh connection, running a remote client data-gathering command,
> waiting for the output, etc. Imagine 2000 of those firing off every 5
> minutes. How many simultaneous ssh sessions can your server handle?
> I've seen a server brought to its knees by a script that ran amok and
> was doing 50 simulataneous scp commands :)  Some time saving is done by
> using msgcache (no waiting for the data-gathering), but there is still
> the overhead of ssh itself, and having key-based ssh ability could be
> deemed a security risk (anyone who hacks into the hobbit server could
> then ssh to all of your client machines without a password).
I don't know how you secure your servers, but nobody is getting into my
hobbit/hobcen servers with out authorization.  Believe it or not, there are
ways to prevent unauthorized access.  The caveat here is that I don't put
mine on a public IP.  :-)
> 
> A good solution would be an ssl-encrypted, bi-directional protocol. This
> would allow secure transfer of client data, either push or pull, without
> the overhead, management, and security risks of using ssh.
> 
> In the meantime, definitely check out the pulldata+msgcache option, as
> it sounds like it will do what you want.
I have not looked at the option you note, however, there are times where
deploying clients is not an option.  I suspect that is why bb-central was
born and why I developed hobcen.  Like I said, it started as a shell script,
morphed into a C application and then a POSIX-threaded C application.  This
was all based on shared ssh keys, but after coming from a stint in a DC with
60,000+ nodes on 3 acres of raised floor, I learned very quickly how to use
ssh pw auth for batch communications that is fast.  :-)

We all have issues to resolve and like UNIX, there are 10 different ways to
solve any one of them.

Cheers,

Tim
> 
> -Charles
> 
> Tim Rotunda wrote:
>> To answer Axel's what is it question.....its a Hobbit version of BB-Central,
>> which runs on a central server like hobbit does.  It reaches out to the
>> clients via ssh (or whatever) and collects data.  I did a shell script
>> version a few years ago and it worked good until the client count topped
>> 25-30.  Then I migrated it to C and it would handle 60+ nodes pretty well.
>> Then I migrated that to a multi-threaded C process and it really smoked.  I
>> never did reach the limit with that version.  I think they are still using
>> it and adding nodes to the client list, which is prob over 250 or so.
>> 
>> I was going to put it out to the community but my company would not allow it
>> (idiots) so I couldn't.  I now work only 40 hours a week so now I have some
>> time to myself and was thinking about rewriting it from memory and putting
>> it out there.  I would put out the one that is threaded and it would prob
>> just be for x86 Linux, which should build on Solaris, HP-UX, etc.
>>   
> 
> 
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
> 
> 

This E-mail and any of its attachments may contain Time Warner
Cable proprietary information, which is privileged, confidential,
or subject to copyright belonging to Time Warner Cable. This E-mail
is intended solely for the use of the individual or entity to which
it is addressed. If you are not the intended recipient of this
E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents
of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.




More information about the Xymon mailing list