[xymon] xymon-4.3.0-RC1: alerting question
Dominique Frise
dominique.frise at unil.ch
Mon Feb 7 15:37:14 CET 2011
Hi Henrik,
Thanks for replying.
On 02/ 7/11 01:10 PM, Henrik Størner wrote:
> In<4D4C0F83.8080204 at unil.ch> Dominique Frise<dominique.frise at unil.ch> writes:
>
>> What is the minimum time for the same alert status to stay up to be
>> processed correctly by Xymon ?
>
> I am not sure I understand the question - are you saying that
> Xymon does not generate the notifications you expect it to ?
>
Sort of...
We have SNMP trap handling configured (thanks Andy Farrior) but are not
completely happy with how it handles the alerting.
When a bad trap from a given host is received, an alert status is
generated for Xymon (yellow or red). So far, so good.
Then, before this status'validity is expired (before it turns purple), a
periodic launch of a script will reset its color to green, thus
generating a recovered message indenpendently of the real status of the
service reported by the trap. Further more, while a <host>.trap status
is in alert state, other bad traps from same host and of same level will
not generate any alerts (igmored).
Here follow a description of what we are trying to implement in order to
improve this hanlding:
****
1. a bad <host>trap is detected.
2. generate a yellow/red <host>.trap status for Xymon.
3. after a short delay (ideally 1 sec.), generate a clear <host>.trap
status for Xymon.
All traps status except those in alert state are periodically set to clear.
The red/yellow -> clear transition should not generate a recovered
message. This should be achieved by removing "clear" from "OKCOLORS" in
xymonserver.cfg but this does not work without modifying xymond_alert.c.
A good <host>.trap should generate a green message and thus a recovered
message.
We know that a 100% handling of traps in Xymon is not possible because
we are misusing a single status (trap) to report many others, but his
scenario would allow:
- a better alerting of all bad traps from the same host and of same level.
- the recovered status is a real recover (the text of the trap explains
what recovered)
****
The issue we have now is that we are missing some alerts. We enabled
debug and tracing but due to the amount of alerts we get, it is
extremely difficult to follow one single alert. We think this could be
related how xymond_alerts handles bunches of messages (10 sec.handling).
Can you please confirm ?
Thanks for your time.
Dominique
>> For example in following transitions, what would the minimum time (in
>> sec.) for the yellow statuses (same check) to be processed correctly by
>> Xymon ?
>
>
>> long t. short t. long t. short t. long t. long t.
>> green -> yellow -> clear -> yellow -> clear -> green
>> alert alert recovered
>
> Provided you have alerts setup on a yellow status, and there is not
> a DURATION parameter that delays the alert, then you should get an
> alert on each of the transitions to yellow.
>
> ("clear" is not an alerting color - only yellow, red and purple are).
>
> The only "minimum time" Xymon has in relation to alerts, is the
> DURATION parameter that you specify in alerts.cfg (hobbit-alerts.cfg
> in older versions).
>
>
> Regards,
> Henrik
>
>
> To unsubscribe from the xymon list, send an e-mail to
> xymon-unsubscribe at xymon.com
>
>
More information about the Xymon
mailing list