[hobbit] hobbit-alerts.cfg: behaviour of TIME and DURATION together

SebA spa at syntec.co.uk
Tue Feb 10 16:33:57 CET 2009


Henrik Størner <mailto:henrik at hswn.dk> wrote:
> In <1233173020 at mknews.sslug.dk> "SebA" <spa at syntec.co.uk> writes:

<snip>

> I agree that the way it works currently is not entirely what
> you would
> expect from the rules you have. What would probably be best
> was for Xymon
> to calculate the duration based on the COLOR-settings defined for the
> alert (so for your rules, it would mean the alert triggered 2
> respectively 15 minutes after the status went red - and yellow-time
> was ignored). 
> 
> The problem with that approach is that it breaks down when a status
> wobbles between yellow and red - e.g. a disk that is filled
> to just around
> the critical level: You could end up in a situation where you wouldn't
> get any alerts because it didn't stay red long enough to
> exceed the color-
> specific DOWNTIME setting.
> 
> 
> But it would probably make more sense than the current modus operandi.
> I'll see what I can do about that.

If the alert timestamp is recorded as the first time the alert goes to one
of the colours in the COLOR rule instead of any of the ALERTCOLORS, but
recoveries are only on green, or whatever, then it would mean that this
alert for the flapping disk full message would still get sent but maybe the
2nd time it went red. So, it might be slightly better than what we have now.
However, this still wouldn't prevent lots of alerts coming to me that I
don't want since this test can flap between yellow and red and I consider
yellow to be a sufficient degree of recovery that I don't want another alert
as soon as it goes red again. If we look at disk in particular though,
surely if it is flapping between yellow and red the problem isn't too
serious. If one does want an alert for this, one can eliminate the DURATION
rule. If one does not, the DURATION rule should be a way of preventing
getting alerts for the flapping behaviour. This is what I've always
considered the use of the DURATION rule (although I was wrong given the way
it is currently working). Perhaps a more flexible and useful solution, while
still remaining easy to use, is to incorporate the change you suggest with a
RECOVERY= rule in the alerts. So each rule can specify what colour
consistutes a recovery. This means that some tests can have yellow while
others have green, allowing for different alerting behaviour for flapping
depending on the test, and it also allows those who get notified of
recoveries to have this information when they want. :)

Did you look at the original message in this thread, which was a slightly
different scenario?

Kind regards,

SebA




More information about the Xymon mailing list