[xymon] xymon_4.3.0-RC1: possible lost alerts
Dominique Frise
dominique.frise at unil.ch
Mon Feb 14 17:08:51 CET 2011
On 02/14/11 02:51 PM, Henrik Størner wrote:
> In<4D593040.6090808 at unil.ch> Dominique Frise<dominique.frise at unil.ch> writes:
>
>> what is suppose to happen if you remove the "clear" color from OKCOLORS
>> in xymonserver.cfg ?
>
> Then a "clear" status would trigger alerts, i.e. the xymond_alert
> module would begin to see alert-messages for a clear status (same
> as for yellow, red, purple).
>
> I don't think you would actually see any alerts being sent, unless
> you also change ALERTCOLORS to include the "clear" status.
>
> But that would be a bad idea, since "clear" is also used for
> e.g. "noping" hosts, or for client-side statuses (cpu, disk, ...)
> when the server is down ("conn" status is red means client-side
> tests will not go purple - they go clear).
>
>> We would expect that not recovery message should be sent when a status
>> goes from yellow/red to clear. Only the repeat interval should be reset.
>> Does this make sense ?
>
> Kind of, yes. I don't recall if it was actually tested.
>
(Sorry, same reply was sent before with garbage as top post.)
I dont't think it was ;-)
Here below the little changes we made in xymond_alerts.c (version before
your last changes) to achieve this:
[super at iris xymond]# diff -u xymond_alert.c.dist xymond_alert.c
--- xymond_alert.c.dist Sun Nov 14 18:21:19 2010
+++ xymond_alert.c Mon Feb 14 15:02:24 2011
@@ -355,7 +355,7 @@
char *msg;
int seq;
int argi;
- int alertcolors, alertinterval;
+ int alertcolors, alertinterval, okcolors;
char *configfn = NULL;
char *checkfn = NULL;
int checkpointinterval = 900;
@@ -377,6 +377,7 @@
/* Load alert config */
alertcolors = colorset(xgetenv("ALERTCOLORS"), ((1 <<
COL_GREEN) | (1 << COL_BLUE)));
alertinterval = 60*atoi(xgetenv("ALERTREPEAT"));
+ okcolors = colorset(xgetenv("OKCOLORS"), (1 << COL_RED));
/* Create our loookup-trees */
hostnames = rbtNew(name_compare);
@@ -656,7 +657,7 @@
awalk->maxcolor = newcolor;
}
}
- else {
+ else if ((okcolors & (1 << newcolor)) != 0) {
/*
* Send one "recovered" message out
now, then go to A_DEAD.
* Dont update the color here - we want
recoveries to go out
@@ -663,6 +664,11 @@
* only if the alert color triggered an
alert
*/
awalk->state = A_RECOVERED;
+ } else {
+ /*
+ * This color should not trigger
"recovered" messages.
+ */
+ awalk->state = A_NORECIP;
}
With this in place we can better support alerting for SNMP traps (see
previous discussion with Buchan
http://www.xymon.com/archive/2011/02/msg00062.html), but then we want
all short transitions from an alert state to a clear status to be
processed by Xymon (not ignored).
Dominique
More information about the Xymon
mailing list