[Xymon] Source of "Xymon [hostname]:[service] recovered? (stale)" alerts

Greg Earle earle at isolar.DynDNS.ORG
Wed Nov 6 21:49:20 CET 2019


All:

Ever since I upgraded my work setup from Xymon 4.3.12 to 4.3.28 (on RHEL 
7.6), I've been seeing situations where we'll get alerts that should be 
one-time only (or maybe twice), but they stick around persistently.

We keep getting the same alert every hour until someone restarts the 
Xymon service.

Once the service is restarted, the alert goes away and we get a "Xymon 
[hostname]:[service] recovered? (stale)" alert in its place.

The frequency of them seems to be somewhat random.  Some days I don't 
get any.  Then 3 days ago I had 6 instances of them, 2 days ago there 
were only 2, then 5 more yesterday.  They are usually "msgs" service 
alerts, but not all the time.  I've seen some for "conn" or "cpu", and 
even one for a "telnet" check we have for an APC UPS.

What really surprised me is that in my 5 1/2+ years of archives of this 
mailing list, I don't see any mentions of this issue.

I could toss in a "cron" job to automatically restart Xymon once a day, 
but that's a kludge.

What could be possible causes of these 'stuck'/repeated alerts, which 
end up becoming stale?

I noticed that alerts.cfg(5) says

> (A stale alert is one where the service recovered during a +time that 
> xymond_alert was not running.)

but that doesn't seem to be applicable here - unless it's describing the 
brief period between stopping & restarting the Xymon service.

		- Greg


More information about the Xymon mailing list