[Xymon] Xymon + graphite

Thu Dec 10 17:02:15 CET 2015

On Thu, December 10, 2015 2:49 am, Jeremy Laidman wrote:
> On Tue, Dec 8, 2015 at 5:49 AM Galen Johnson <Galen.Johnson at sas.com>
> wrote:
>
>> Has anyone tried to integrate alerting based on Graphite?  Or used
>> Graphite as a trending replacement to rrd?  I love Xymon for my
>> monitoring
>> but the limitations and aggregations of rrds are starting to become an
>> issue.
>>
> Nope, but I'm intrigued by Graphite.  Most of my servers have enormously
> long trends pages because of all the extra graphs I've added.  These are
> indispensable for tracking down weird faults.  But the number of graphs
> and
> RRD files has become unwieldy.  One major shortcoming is that I can't put
> metrics from different hosts onto the same graph.  I've used RRGrapher <
> http://pages.cs.wisc.edu/~plonka/RRGrapher/> to let me create ad-hoc
> graphs
> like this, but it's obviously from last millennium, and could do with a
> facelift.

I'd been looking at http://www.flotcharts.org/ and a few other RRD
graphing packages that could be used providing a more browseable
interface. There's absolutely a need (aside from the CSS work and a
potential "dashboard" view generally) for improved multi-host and
multi-graph views besides the linear trends output, I agree.

>
> For trending, Xymon can threshold (alert) on RRD files with the "DS"
> operator in analysis.cfg.  Perhaps this can be extended to alert on
> Holt-Winters aberrant behaviour thresholds.  Doing the same sort of thing
> with a rewrite of the g2zproxy probably wouldn't be too difficult, at
> least
> not on the Xymon side.
>

(Actually, the RRD files generated on new RPM installs have had HWPREDICT,
SEASONAL, and a few other RRA's configured for a while now, if anyone
feels like experimenting...)

One problem with the current RRD paradigm is that alerting is happening
only with data available at insertion time, not using data that's stored
into RRD file (or whatever metric store you have) already, so xymond_rrd
can't efficiently alert on things beyond that.

A "xymond_trend" could operate asynchronously on the RRD files, but to get
useful trend data back out of RRDs you'll need to flush the data to disk
first, which more or less blows out your I/O performance. Fine if you're
on SSD, but more of a problem if you're on heavily loaded spinning disks.

The problem there is just that there're just so many different ways of
doing this with a lot of different needs. To make something flexible
enough would require a good survey of what people are looking for.

(With that in mind -- What are people looking for? :) Maybe it's easier
than I'm thinking.)

Alternatively, sending the metric data off entirely to a different
package, which can reinject an alert into xymon if/when it notices a
trend, is an easily-approachable option using the RRD --processor option,
which can fork your metric feed off to whatever you like (OpenTSDB,
graphite, splunk, etc...). The re-posting of alerts back into xymon can be
done with that package's notification tool set and some scripting of xymon
messages.

Regards,
-jc