[Xymon] Xymon no longer sending alerts
Jeremy Laidman
jeremy at laidman.org
Wed Jan 31 01:31:58 CET 2024
Hi Colin
>From the logs, it appears that xymond_alert is unable to communicate with
your Xymon server on 10.10.10.10:1984. It seems to be trying to fetch the
hosts.cfg file contents via the BB protocol by sending a "config hosts.cfg"
command to xymond, but xymond is not responding.
The select() system call is monitoring a file handle or socket for
activity, likely the TCP socket with 10.10.10.10:1984. The timeout means
that the select() call didn't return a response in the expected time. This
suggests that the TCP connection was established correctly (xymond is
listening and IP/port are likely correct) and xymond_alert sent the request
for the hosts.cfg file, but there was no response.
It might be worth checking xymond.log for messages corresponding to the
timestamps of the errors from xymond_alert.
I'm not convinced this is the reason that you're not getting alert emails.
If xymond_alert can't get hosts.cfg from a BB message, it should be able to
get it directly from the filesystem, and then carry on. So the messages
you're seeing might be a red herring, although I wouldn't expect them to
show up on a normally operating Xymon installation. Having said that, my
Xymon installation is showing those log messages, yet I've no reason to
think that our alerting is broken, so perhaps it's just something that can
be ignored.
It might be worth taking a look at the man page for xymond_alert, and have
a go at the --test, --trace and --dump-config options.
In case it's not obvious, I'm really not sure what the problem could be,
and I'm just throwing out some ideas in case something helps.
J
On Wed, 31 Jan 2024 at 10:50, Colin Coe <colin.coe at gmail.com> wrote:
> Hi all
>
> Our Xymon server has recently stopped sending alert emails. This server is
> also running Postfix and is our mail relay.
>
> From alert.log all I see is:
> 2024-01-31 02:17:39.813610 Whoops ! Failed to send message (Select(2)
> failed)
> 2024-01-31 02:17:39.829027 -> Select failure while sending to Xymon
> daemon at 10.10.10.10:1984
> 2024-01-31 02:17:39.829032 -> Recipient '10.10.10.10', timeout 50
> 2024-01-31 02:17:39.829037 -> 1st line: 'config hosts.cfg'
> 2024-01-31 02:17:39.829042 Cannot load hosts.cfg from xymond: Select(2)
> failed
> 2024-01-31 02:17:39.829049 Failed to load from xymond, reverting to
> file-load
> 2024-01-31 02:22:40.932828 Whoops ! Failed to send message (Select(2)
> failed)
> 2024-01-31 02:22:40.932863 -> Select failure while sending to Xymon
> daemon at 10.10.10.10:1984
> 2024-01-31 02:22:40.932867 -> Recipient '10.10.10.10', timeout 50
> 2024-01-31 02:22:40.932871 -> 1st line: 'config hosts.cfg'
> 2024-01-31 02:22:40.932876 Cannot load hosts.cfg from xymond: Select(2)
> failed
> 2024-01-31 02:22:40.932881 Failed to load from xymond, reverting to
> file-load
>
> And notifications.log is zero bytes in size.
>
> I added "--debug" to the "[alert]" section of /etc/xymon/tasks.cfg and
> while the verbosity was increased, there was no indication of why alerts
> are not being sent.
>
> Any clues how I can debug this?
>
> Thanks
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20240131/e32ed4e1/attachment.htm>
More information about the Xymon
mailing list