[hobbit] Some thoughts on clustered hobbit

Tue May 10 13:35:10 CEST 2005

On May 10, 2005, at 3:21 AM, Brian Lynch wrote:

>
>
> On 5/9/05, Kauffman, Tom <KauffmanT at nibco.com> wrote:
> First, let me express my thanks to Brian for putting this document
> together and allowing Henrik to distribute it! I've a lot of  
> experience
> with IBM's HACMP for AIX, and getting a clustered configuration  
> working
> as desired is not a trivial procedure.
>
> Henrik -- check me on this: it's my impression we no longer need a
> 'BBPAGER' entry on the client-side bb-hosts because the hobbit server
> passes all potentially alertable statuses to hobbit-alert and it  
> decides
> if an alert is really required.
>
> Brian -- no offense, but I would rather categorise your  
> configuration as
> "active/inactive". I'm looking at doing an "active/passive" cluster  
> when
> time frees up -- about a month from now. The difference? I'm  
> running two
> hobbit/apache instances all the time -- but the 'passive' (fallover)
> side is not doing alerting or network tests. It does build displays
> (it's my technical documentation server as well) and it does keep both
> history and rrd data updated. Both hosts show up on the client side as
> 'BBDISPLAY'. On failover it will take over the IP address for the  
> hobbit
> display and re-launch hobbit with network testing and alerting  
> enabled.
>

Software exists to do this active/passive, a search for HA or  
FAILOVER will the Linux-HA project is one example.

I biggest issue in my view is keeping the systems in sync and is  
often done with a shared storage.

> I agree with your assessment, but chose the model for a few reasons
> (note that I'm basing my experience on about 2 1/2 years running a  
> dual
> big brother failover setup):
>
> 1. There is always one repository for both configuration and data  
> that are
> kept reasonably identical on both systems (within the synch delay).
> 2. There is only one ip address accepting BB reports cutting down on
> both network traffic and firewall rules (for hosts in locked down  
> vlans).
> 3. The other system can be dedicated to another purpose (it currently
> hosts our documentation site that fails over in the opposite  
> direction).
> 4. No redundant work is done. Indeed, no load is being 'shared' across
> the systems unless you host the web server on the other box.
>  There is a risk to this based on the possibility of complete  
> machine failure
>  in between synchronizations.  Hence, Hobbit may come up without all
>  the updates for hosts or alerts.  Based on my current model, I  
> will lose
> about a day of historical data.  These synch rates can be changed and
> a gigabit crossover between machines cuts down on any traffic imposed
> by multiple synch's.
>
> Note that you could very easily turn off the hobbit alerts with the  
> same
> clustering software by truncating and restoring the hobbit- 
> alerts.cfg file.
> Not sure how to disable the network tests, so that may require some
> custom coding... Once complete, you could use the same cluster
> resource sw to accomplish a 'hot' standby.
>

So I think the only issue you have with this method is that you don't  
want the extra network load. If you are willing to require a cross- 
over cable, and that is a requirement for most HA solutions, then one  
solution might be to add the idea of a BACKUP_BBDISPLAY.  The  
BBDISPLAY server would forward all incoming messages to the BACKUP  
and if you have the private network cable it would not cause any load  
on your network.

You would also need a way of telling the Hobbit software on the  
system that is it the BACKUP server.

John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20050510/2e202380/attachment.html>