[Xymon] Source of "Xymon [hostname]:[service] recovered? (stale)" alerts
Greg Earle
earle at isolar.DynDNS.ORG
Wed Nov 6 21:49:20 CET 2019
All:
Ever since I upgraded my work setup from Xymon 4.3.12 to 4.3.28 (on RHEL
7.6), I've been seeing situations where we'll get alerts that should be
one-time only (or maybe twice), but they stick around persistently.
We keep getting the same alert every hour until someone restarts the
Xymon service.
Once the service is restarted, the alert goes away and we get a "Xymon
[hostname]:[service] recovered? (stale)" alert in its place.
The frequency of them seems to be somewhat random. Some days I don't
get any. Then 3 days ago I had 6 instances of them, 2 days ago there
were only 2, then 5 more yesterday. They are usually "msgs" service
alerts, but not all the time. I've seen some for "conn" or "cpu", and
even one for a "telnet" check we have for an APC UPS.
What really surprised me is that in my 5 1/2+ years of archives of this
mailing list, I don't see any mentions of this issue.
I could toss in a "cron" job to automatically restart Xymon once a day,
but that's a kludge.
What could be possible causes of these 'stuck'/repeated alerts, which
end up becoming stale?
I noticed that alerts.cfg(5) says
> (A stale alert is one where the service recovered during a +time that
> xymond_alert was not running.)
but that doesn't seem to be applicable here - unless it's describing the
brief period between stopping & restarting the Xymon service.
- Greg
More information about the Xymon
mailing list