[Xymon] Improving memory monitoring
J.C. Cleaver
cleaver at terabithia.org
Tue Apr 14 18:01:58 CEST 2015
On Tue, April 14, 2015 7:56 am, Steve Hill wrote:
> On 14/04/15 15:11, Mike Burger wrote:
>
>>> I'll say that I've never run into this...I've never had a system swap
>>> memory out to disk unless active memory was utilized at a high
>>> percentage...in either AIX or Linux.
>
> It does spontaneously happen from time to time for me - may be the type
> of work loads these machines do - they do tend to have a fair amount of
> idle data in memory and the kernel quite rightly decides that using that
> for caches/buffers would be a better use.
>
> Also, in situations where something _has_ used up a lot of RAM and
> therefore pushed stuff out to swap, Xymon continues to warn of high swap
> usage after that process has ended because the kernel obviously won't
> bother paging stuff back into the newly emptied RAM until it needs to.
>
>>> Now, on the other side of this, to take a stab at the question, I'd
>>> wager that, at present, you'd need to script such a test/alert..but I
>>> would agree that it would be useful to be able to set an "alarm if
>>> this or this" or an "alarm if this and this" type scenario. At
>>> present, the only tests I can think of that allow this, "out of the
>>> box" are the process monitors, where you can set minimum and maximum
>>> thresholds.
>
> Is there a way of setting analysis.cfg to use a script instead of the
> MEM* directives, or would that need to be a completely external job of
> some kind?
>
There's no built in way to support this via analysis.cfg, or -- more
specifically -- xymond_client, and launching a script to do this
per-report would probably run into scaling issues for larger installs.
We've run into the same "things in swap alerting when they don't really
cause problems" issues, although to some extent we were able to work
around it by business policy (eg, "if transient, then please clear swap
out"), but that's not really the best solution.
Even the "out of the box" monitors handling things at the RRD level (the
'DS' directives) won't let you cross compare two distinct thresholds,
although that would be a nice feature.
About the best thing I can think of for immediate use would be to set the
MEM* alerts to 100/101 and write a new channel listener that reads in
incoming messages, does calculations and either a) issues new type of
status message, or b) issues a "modify" message for the existing 'memory'
status when a certain threshold is crossed.
HTH,
-jc
More information about the Xymon
mailing list