Optimizing Xymon disk performance (was: Moving RRD processing to another server)

Henrik Størner henrik at hswn.dk
Mon Nov 2 23:17:52 CET 2009


Hi Greg,

I've taken the liberty of sending this to the Xymon list also,
since it is probably of general interest.

On Mon, Nov 02, 2009 at 11:39:09AM -0500, shea_greg at emc.com wrote:
> 
> I'm having some trouble trying to figure out how to off-load RRD 
> processing with the 4.3.0 code.  I found hobbitd_locator and that's 
> part of it, as well as hobbitd_channel, but it's not clear to me
> how to setup the master and peer(s).  Also how does this affect 
> the webpage generation? 
> 
> From earlier posts to the list, I have a single server running 4.2.0 
> with over 70000 RRD files and I'm experiencing serious delays in processing 
> data and have to restart Hobbit every 15 minutes.  One solution I'm aware 
> of and also Buchan had mentioned is to add more spindles.

The standard 4.3.0 beta adds caching of RRD file updates, and this has a
significant impact on the I/O load of the server - essentially, it means
that hobbitd_rrd caches up to 12 updates (= 1 hour) before it does an
actual update of the RRD file. Since the amount of disk I/O is almost
identical whether you're doing one data update or 50, this caching
eliminates about 90% of your disk I/O on the RRD files. So that would be 
the simplest solution to implement.

I have 90000+ RRD files. I recently did a hardware upgrade of the
server, but it isn't anything fancy - just a plain HP DL360 with a
set of two 36 GB SCSI disks in hardware RAID-1. I used to off-load 
the RRD handling to another server, but it is now back on the main 
Xymon server. The amount of memory used for the cache isn't all that 
much - about 50 MB on my system.

The only downside to this is that shutting down Xymon means all of the
cached data must be flushed to disk - and this take a while, 10-15
minutes on my system.


Another optimization to eliminate disk I/O is to move the generated
webpages to a RAM disk. I have ~hobbit/server/www/ on a ram-disk; the
gifs, help, menu, notes, rep and snap sub-directories are symlinks that
point to "real" (disk-based) storage. This means all of the webpages
that are re-generated once a minute resides on a RAM disk, eliminating
all of the disk I/O that rewriting them causes. And since they are 
regenerated so often, it doesn't matter that they're wiped out when you
reboot the server - they are regenerated within a minute after you have
Xymon up and running again.



But the remote RRD off-loading works, I've used it for more than a year. 
Here's how to set it up.


First, webpage-generation is unchanged, it still happens on the main 
Xymon server and the fact that the RRD files are stored somewhere else
is transparent.


The main server runs the hobbitd_locator, which keeps track of
where each of the hosts store their RRD files. The RRD server(s)
only run hobbitd_rrd, and a webserver.

On the main server, add these entries to your hobbitlaunch.cfg:

[locator]
	ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
	LOGFILE $BBSERVERLOGS/locator.log
	NEEDS hobbitd
	CMD hobbitd_locator --listen=0.0.0.0:9000

[netrrd-status]
	ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
	NEEDS locator
	CMD hobbitd_channel --channel=status \
		--log=$BBSERVERLOGS/netrrd-status.log \
		--locator=127.0.0.1 \
		--service=rrd

[netrrd-data]
	ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
	NEEDS locator
	CMD hobbitd_channel --channel=data \
		--log=$BBSERVERLOGS/netrrd-data.log \
		--locator=127.0.0.1 \
		--service=rrd

The locator listens on port 9000 - it is a UDP based service (like DNS),
so you may need to open up some firewalls to reach it.


On the RRD offload-servers, you run only the hobbitd_rrd modules with
some additional options that tell it to listen for data from a network
connection. Here's the hobbitlaunch.cfg entry, assuming your main Xymon 
server has IP 192.168.1.1 and the RRD off-load server has IP 192.168.1.2:

[netrrd-worker]
	ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
	CMD hobbitd_rrd \
		--log=$BBSERVERLOGS/netrrd-status.log \
		--rrddir=/var/lib/hobbit/rrd \
		--locator=192.168.1.1:9000 \
		--listen=192.168.1.2:9001 \
		--locatorid=192.168.1.2:9001 \
		--locatorextra=http://192.168.1.2/hobbit-cgi/


OK, this is a bit complicated - I'll try to explain what these options
do.

hobbitd_locator needs to know that this RRD-offload-server exists, and
what hosts it is handling RRD files for. So hobbitd_rrd must announce
itself to the locator - so the "--locator" option tells it how to
contact the locator.

hobbitd_rrd receives data from the remote hobbitd_channel over a network
connection, so the "--listen" option tells it what IP and port-number it
will use to listen for incoming connections from hobbitd_channel.

The IP/portnumber that hobbitd_rrd listens on may not be the one that
hobbitd_channel should use, because the RRD offload server could be
hidden behind a NAT firewall or some other network-based address
translation might be taking place. So the "--locatorid" option announces
the IP+portnumber that hobbitd_channel should use to connect to the
hobbitd_rrd service from the outside. Normally there is no NAT'ing, so
"--listen" and "--locatorid" are identical.

Finally, the "--locatorextra" tells the Xymon web-page tools what URL
they should use when generating links to the Xymon graphs. Since the RRD
files are no longer stored on the main Xymon server, you cannot access
them via the same URL prefix that you use for all of the other Xymon
webpages and CGI's - the "--locatorextra" option is used to tell Xymon
what the URL is for the graphs. And yes, this means you will need to run
a separate webserver on the RRD off-load server.


When hobbitd_rrd starts up with these options, it will first contact the
locator and tell it "hey, I can handle RRD files - if someone wants to
send me some RRD data, they can contact me on 192.168.1.2 port 9001. And
please pass this information to anyone who asks for it:
http://192.168.1.1/cgi-bin". It then proceeds to scan the RRD directory
to determine which hosts it has RRD files stored for, and for each host
it then tells the locator "Hi, I am the RRD server on 192.168.1.2:9001, 
and I have RRD files for host foo.bar.com". After that, it just leans
back and waits for someone to connect to it.

Over on the main Xymon server, the hobbitd_channel modules are receiving
data about RRD updates. Each time a new message arrives, they'll ask the
locator "where are the RRD files stored for host abc1.bar.com" ? If the
locator knows, then it will respond with the IP:portnumber of the 
RRD-server handling this host; if it knows that none of the known RRD
servers handle this host (i.e. it is a new host) then it will just
hand out the IP:portnumber of one of the RRD servers so new hosts can be
added. When the hobbitd_channel module is told "send data for
foo.bar.com to the RRD server at 192.168.1.2:9001" it will establish a
TCP connection to that port (if it doesn't have one open already), and
send the data to it.

When hobbitd_rrd receives a new connection, it spawns an extra process
to handle the connection, which receives the data and then does the
actual RRD update.

The connections between hobbitd_channel and the RRD offload-server are 
persistent, so once it is up and running you'll see two connections
to your RRD offload server; one for each of the hobbitd_channel
instances.


The final piece of the puzzle is when you view the detailed status-log
on the webpage, and the graph must show up on that page. The
hobbitsvc.cgi utility will ask the locator "where are the RRD files for
host foo.bar.com?" and get a response that includes the extra data
that the locator was asked to pass on to anyone who asked. hobbitsvc.cgi
knows that this data is the base of the CGI-URL for the RRD graph CGI,
so instead of generating a link to the image URL on the main Xymon
webserver, it generates a link that points to the RRD off-load server.
The browser contacts the webserver running on the RRD-server, and the
image is generated by the RRD-server.


I hope that is enough to get you going.


Regards,
Henrik




More information about the Xymon mailing list