[xymon] xymon-4.3.0-RC1: alerting question

Dominique Frise dominique.frise at unil.ch
Mon Feb 7 15:37:14 CET 2011


Hi Henrik,

Thanks for replying.

On 02/ 7/11 01:10 PM, Henrik Størner wrote:
> In<4D4C0F83.8080204 at unil.ch>  Dominique Frise<dominique.frise at unil.ch>  writes:
>
>> What is the minimum time for the same alert status to stay up to be
>> processed correctly by Xymon ?
>
> I am not sure I understand the question - are you saying that
> Xymon does not generate the notifications you expect it to ?
>
Sort of...

We have SNMP trap handling configured (thanks Andy Farrior) but are not 
completely happy with how it handles the alerting.
When a bad trap from a given host is received, an alert status is 
generated for Xymon (yellow or red). So far, so good.
Then, before this status'validity is expired (before it turns purple), a 
periodic launch of a script will reset its color to green, thus 
generating a recovered message indenpendently of the real status of the 
service reported by the trap. Further more, while a <host>.trap status 
is in alert state, other bad traps from same host and of same level will 
not generate any alerts (igmored).

Here follow a description of what we are trying to implement in order to 
improve this hanlding:

****
1. a bad <host>trap is detected.
2. generate a yellow/red <host>.trap status for Xymon.
3. after a short delay (ideally 1 sec.), generate a clear <host>.trap 
status for Xymon.

All traps status except those in alert state are periodically set to clear.
The red/yellow -> clear transition should not generate a recovered 
message. This should be achieved by removing "clear" from "OKCOLORS" in 
xymonserver.cfg but this does not work without modifying xymond_alert.c.
A good <host>.trap should generate a green message and thus a recovered 
message.

We know that a 100% handling of traps in Xymon is not possible because 
we are misusing a single status (trap) to report many others, but his 
scenario would allow:

- a better alerting of all bad traps from the same host and of same level.
- the recovered status is a real recover (the text of the trap explains 
what recovered)
****


The issue we have now is that we are missing some alerts. We enabled 
debug and tracing but due to the amount of alerts we get, it is 
extremely difficult to follow one single alert. We think this could be 
related how xymond_alerts handles bunches of messages (10 sec.handling).

Can you please confirm ?

Thanks for your time.

Dominique


>> For example in following transitions, what would the minimum time (in
>> sec.) for the yellow statuses (same check) to be processed correctly by
>> Xymon ?
>
>
>> long t.    short t.     long t.    short t.    long t.    long t.
>> green  ->   yellow   ->   clear  ->   yellow  ->   clear  ->   green
>>             alert                   alert                  recovered
>
> Provided you have alerts setup on a yellow status, and there is not
> a DURATION parameter that delays the alert, then you should get an
> alert on each of the transitions to yellow.
>
> ("clear" is not an alerting color - only yellow, red and purple are).
>
> The only "minimum time" Xymon has in relation to alerts, is the
> DURATION parameter that you specify in alerts.cfg (hobbit-alerts.cfg
> in older versions).
>
>
> Regards,
> Henrik
>
>
> To unsubscribe from the xymon list, send an e-mail to
> xymon-unsubscribe at xymon.com
>
>



More information about the Xymon mailing list