[Xymon] Regional Servers to Central

Thomas Eckert thomas.eckert at it-eckert.de
Wed May 4 13:37:06 CEST 2016


> On 04 May 2016, at 09:44, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
> 
> On Tue, May 3, 2016 at 10:19 PM Thomas Eckert <thomas.eckert at it-eckert.de <mailto:thomas.eckert at it-eckert.de>> wrote:
>> (…)
> 
>  
> Using `xymonproxy` on the regional servers would allow to deliver the status-messages to the (local) regional server _and_ the central one.
> 
> Makes perfect sense.
>  
> This is outlined here <http://www.it-eckert.com/blog/2014/combine-ssh-tunnel-with-xymonproxy/ <http://www.it-eckert.com/blog/2014/combine-ssh-tunnel-with-xymonproxy/>> (the "remote-datacenter” would be a regional server). To have the data available on the regional server as well the `xymond` there has to listen on say `127.0.0.1:1986` and xymonproxy report to that location as well (xymonproxy-option for sending to multiple servers `--server=SERVERIP[:PORT][,SERVER2IP[:PORT]]` — according to `xymonproxy(8)` up to 3 servers are possible, configuration it pulled from the _last_ in the list!). The order of the xymonproxy “receivers” would also allow the configuration of the regions to be either from central or from the regional server.
> 
> I think what's missing from this plan is that the xymonnet probes need to be diverted to the xymonproxy instance also.  But apart from that, I can't see why it wouldn't work for me.

I was assuming that the network probes of xymonnet  are running on the regional servers (using `NET:region1`, `NET:region2`, …). In that case `xymonnet` reports to the xymonproxy (on the respective regional server) on the default port 1984 like all client of that region do.

> The ssh-tunnel is handled by the central server using the `ssh-tunnel` extension (either original <https://wiki.xymonton.org/doku.php/addons:ssh_tunnel <https://wiki.xymonton.org/doku.php/addons:ssh_tunnel>> or extended version <http://www.it-eckert.com/software/patches/ssh-tunnel/ <http://www.it-eckert.com/software/patches/ssh-tunnel/>>) that takes care the tunnel is up.
> 
> I'm not sure why, but I prefer to have transient ssh tunnels for the duration of the message transfer, rather than persistent tunnels that hang around forever.  (Either way, ssh encryption and authentication is provided.)  It looks like xymonproxy has a buffer, and so it may be able to queue messages destined for the central server for a short period of time, and the --lqueue size can be adjusted to manage this.  Although I would be more comfortable if it had the ability to save its queue to disk so as to survive restarts, and also to allow for a much larger queue.

May be a matter of taste. I prefer to avoid the “establish-connection” overhead and have the tunnel monitored in xymon as well.
To be fair: It happens that the tunnel is in an undefined state after a network outage and is not re-started. This situation is detected by a pupae-explosion for hosts that report through that tunnel (after ~30 minutes). The same would happen in case the transient tunnel has issues.

> If xymonproxy offers the same queueing capability as msgcache, without the shortcomings, then it seems like an easy choice to make.  I just don't know if it can queue things for long enough if my central server "checks in" every 5 minutes, instead of having a persistent tunnel.  Come to think of it, I probably need to know what happens to the queue if a persistent tunnel goes down for a period of time (eg a firewall crashes and the link goes down for 30 minutes).  Will the extra messages overflow?  What would happen if a red-to-green transition was lost?

At least xymonproxy was not designed for that. The buffer is mainly to “smooth out peaks” and combine status messages into combo messages as I understand it. At least time-differences are reported in that case (this affects the use of `CLOCK` in `analysis.cfg`).
In case of short connectivity issues or during network congestion (between ymonproxy on xymond) I can observe the mentioned delay (resulting in reported timediff/clock offset).
My impression for longer connectivity loss was the loss of messages but on 2nd thought it may only lead to gabs in the RRD-graphs (as the reported time does not match the expected update interval).

Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160504/9817282b/attachment.html>


More information about the Xymon mailing list