[Xymon] Dynamic "normal" thresholds for CPU, disk, network, etc

Jeremy Laidman jlaidman at rebel-it.com.au
Thu Oct 3 03:58:01 CEST 2013


(It's generally considered bad form to hijack a thread by changing topic
mid-way.  I'm posting my response with a new subject, so that the original
thread can continue undiluted.)

On 3 October 2013 09:48, Adam Goryachev <mailinglists at websitemanagers.com.au
> wrote:

> PS, not relevant to this entire discussion, but one thing I've been
> battling with is trying to define a "normal" status. eg, I can set the CPU
> load to 5, which normally means the status is always green, but one day the
> cpu load might be 4 at 2pm, and that is abnormal, even though a load of 4
> at 2am is normal. Does anyone use/do anything to automatically watch the
> current values, and learn what range is "normal" on this day/time? For me,
> this especially applies to counters related to network performance, disk
> performance, etc.


RRDs can handle this using Holt-Winters aberrant behaviour detection.  I
set this up once on an MRTG system, but as yet have not tried to get it
working for Xymon-derived RRD files.  On my list of things to do.

I think it's fairly easy to add the 6 extra consolidation functions
(HWPREDICT, SEASONAL, etc) into the RRD file using rrdtune, or add them to
the rrddefinitions.cfg file and recreate the RRD files.  Then you just need
to adjust the graph definitions to show the expected ranges.  I think the
tricky part is to specify the consolidation function parameters required to
produce useful predications for "normal" based on the nature of the data
being collected.

The seminal paper (AFAIK) on this was written by Brutlag and presented at
the Usenix "LISA 2000" sysadmin conference.  It has example graphs and
RRDtool definitions to do this, as well as a complete explanation (deeper
than I can grok) of how it all works.

https://www.usenix.org/legacy/events/lisa00/full_papers/brutlag/brutlag_html/

Also, people have described their adventures previously on The List.  For
example:

http://lists.xymon.com/pipermail/xymon/2012-October/035810.html

If you can get this going to your satisfaction, perhaps you could document
what you did and share it with us.

Cheers
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20131003/a925e858/attachment.html>


More information about the Xymon mailing list