<div>Hi all,</div>

<div> </div>

<div>I am redesigning the method we use for performing a failover to a disaster recovery installation of hobbit. I am interested in opinions on the approach and any shortcomings.</div>

<div> </div>

<div>Note: This is not HA/clustering, it is for DR purposes.</div>

<div> </div>

<div>We are aiming to have:</div>

<div> </div>

<div>a production hobbit deployment</div>

<div>a DR hobbit deployment</div>

<div> </div>

<div>clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.</div>

<div> </div>

<div>The production server will be configured to send out alerts. The dr server will not.</div>

<div> </div>

<div>At regular intervals, rsync will be used to synchronise data from the production server to the dr server, including the in memory checkpoint file.</div>

<div> </div>

<div>In the event of a dr, the dr hobbit server will be promoted to active by restarting hobbit, and loading the checkpoint and alert configurations.</div>

<div> </div>

<div>I am expecting that this will ensure that the dr server will be "up to date" with proudction as per the last checkpoint. This includes tests that have been disabled or acknowledged.</div>

<div> </div>

<div>Prior to failback to the production hobbit installation, the reverse of the above would be performed.</div>

<div>An rsync of rrd data files would be performed to cover any windows where one of the servers was offline for a period of time.</div>

<div> </div>

<div>Is there anything wrong with this approach?</div>

<div> </div>

<div>Cheers</div>

<div> </div>

<div>Phil</div>

<div> </div>

<div><br clear="all"><br>-- <br>Tel: 0400 466 952<br>Fax: 0433 123 226<br>email: philwild AT <a href="http://gmail.com">gmail.com</a> </div>