[Xymon] Xymon + graphite

Thu Dec 10 18:05:40 CET 2015

On 12/10/2015 08:02 AM, J.C. Cleaver wrote:
>
> On Thu, December 10, 2015 2:49 am, Jeremy Laidman wrote:
>> On Tue, Dec 8, 2015 at 5:49 AM Galen Johnson <Galen.Johnson at sas.com>
>> wrote:
>>
>>> Has anyone tried to integrate alerting based on Graphite?  Or used
>>> Graphite as a trending replacement to rrd?  I love Xymon for my
>>> monitoring
>>> but the limitations and aggregations of rrds are starting to become an
>>> issue.
>>>
>> Nope, but I'm intrigued by Graphite.  Most of my servers have enormously
>> long trends pages because of all the extra graphs I've added.  These are
>> indispensable for tracking down weird faults.  But the number of graphs
>> and
>> RRD files has become unwieldy.  One major shortcoming is that I can't put
>> metrics from different hosts onto the same graph.  I've used RRGrapher <
>> http://pages.cs.wisc.edu/~plonka/RRGrapher/> to let me create ad-hoc
>> graphs
>> like this, but it's obviously from last millennium, and could do with a
>> facelift.
> I'd been looking at http://www.flotcharts.org/ and a few other RRD
> graphing packages that could be used providing a more browseable
> interface. There's absolutely a need (aside from the CSS work and a
> potential "dashboard" view generally) for improved multi-host and
> multi-graph views besides the linear trends output, I agree.
>
>> For trending, Xymon can threshold (alert) on RRD files with the "DS"
>> operator in analysis.cfg.  Perhaps this can be extended to alert on
>> Holt-Winters aberrant behaviour thresholds.  Doing the same sort of thing
>> with a rewrite of the g2zproxy probably wouldn't be too difficult, at
>> least
>> not on the Xymon side.
>>
> (Actually, the RRD files generated on new RPM installs have had HWPREDICT,
> SEASONAL, and a few other RRA's configured for a while now, if anyone
> feels like experimenting...)
>
>
> One problem with the current RRD paradigm is that alerting is happening
> only with data available at insertion time, not using data that's stored
> into RRD file (or whatever metric store you have) already, so xymond_rrd
> can't efficiently alert on things beyond that.
>
> A "xymond_trend" could operate asynchronously on the RRD files, but to get
> useful trend data back out of RRDs you'll need to flush the data to disk
> first, which more or less blows out your I/O performance. Fine if you're
> on SSD, but more of a problem if you're on heavily loaded spinning disks.
>
> The problem there is just that there're just so many different ways of
> doing this with a lot of different needs. To make something flexible
> enough would require a good survey of what people are looking for.
>
> (With that in mind -- What are people looking for? :) Maybe it's easier
> than I'm thinking.)
>
>
> Alternatively, sending the metric data off entirely to a different
> package, which can reinject an alert into xymon if/when it notices a
> trend, is an easily-approachable option using the RRD --processor option,
> which can fork your metric feed off to whatever you like (OpenTSDB,
> graphite, splunk, etc...). The re-posting of alerts back into xymon can be
> done with that package's notification tool set and some scripting of xymon
> messages.
>
>
> Regards,
> -jc

Having done a bit of this type of thing in another life, what you're discussing is what we termed an alert manager/data collector architecture.  The entire beauty of rrd data
storage is it's simplicity and It automatically does rollups.

I started my charting using flot and because of the complexity  of managing js charting on all the different browsers, I eventually scrapped js charting entirely and used GD to
generate chart images.  For the particular use case, RRD didn't make sense as exact storage historical data was mandatory... rollups/data averaging was not allowed.