[Xymon] Xymon no longer sending alerts

Colin Coe colin.coe at gmail.com
Wed Jan 31 02:21:03 CET 2024


Hi Jeremy

Running the following gives me the expected result so the server is
responding, at least sometimes.
xymon 127.0.0.1 "config hosts.cfg"

Is this a worry: "Discarding timed-out partial msg from 127.0.0.1"? Getting
lots of these...

I've added --trace to xymond_alerts and will go through that.

Thanks



On Wed, 31 Jan 2024 at 08:32, Jeremy Laidman <jeremy at laidman.org> wrote:

> Hi Colin
>
> From the logs, it appears that xymond_alert is unable to communicate with
> your Xymon server on 10.10.10.10:1984. It seems to be trying to fetch the
> hosts.cfg file contents via the BB protocol by sending a "config hosts.cfg"
> command to xymond, but xymond is not responding.
>
> The select() system call is monitoring a file handle or socket for
> activity, likely the TCP socket with 10.10.10.10:1984. The timeout means
> that the select() call didn't return a response in the expected time. This
> suggests that the TCP connection was established correctly (xymond is
> listening and IP/port are likely correct) and xymond_alert sent the request
> for the hosts.cfg file, but there was no response.
>
> It might be worth checking xymond.log for messages corresponding to the
> timestamps of the errors from xymond_alert.
>
> I'm not convinced this is the reason that you're not getting alert emails.
> If xymond_alert can't get hosts.cfg from a BB message, it should be able to
> get it directly from the filesystem, and then carry on. So the messages
> you're seeing might be a red herring, although I wouldn't expect them to
> show up on a normally operating Xymon installation. Having said that, my
> Xymon installation is showing those log messages, yet I've no reason to
> think that our alerting is broken, so perhaps it's just something that can
> be ignored.
>
> It might be worth taking a look at the man page for xymond_alert, and have
> a go at the --test, --trace and --dump-config options.
>
> In case it's not obvious, I'm really not sure what the problem could be,
> and I'm just throwing out some ideas in case something helps.
>
> J
>
> On Wed, 31 Jan 2024 at 10:50, Colin Coe <colin.coe at gmail.com> wrote:
>
>> Hi all
>>
>> Our Xymon server has recently stopped sending alert emails. This server
>> is also running Postfix and is our mail relay.
>>
>> From alert.log all I see is:
>> 2024-01-31 02:17:39.813610 Whoops ! Failed to send message (Select(2)
>> failed)
>> 2024-01-31 02:17:39.829027 ->  Select failure while sending to Xymon
>> daemon at 10.10.10.10:1984
>> 2024-01-31 02:17:39.829032 ->  Recipient '10.10.10.10', timeout 50
>> 2024-01-31 02:17:39.829037 ->  1st line: 'config hosts.cfg'
>> 2024-01-31 02:17:39.829042 Cannot load hosts.cfg from xymond: Select(2)
>> failed
>> 2024-01-31 02:17:39.829049 Failed to load from xymond, reverting to
>> file-load
>> 2024-01-31 02:22:40.932828 Whoops ! Failed to send message (Select(2)
>> failed)
>> 2024-01-31 02:22:40.932863 ->  Select failure while sending to Xymon
>> daemon at 10.10.10.10:1984
>> 2024-01-31 02:22:40.932867 ->  Recipient '10.10.10.10', timeout 50
>> 2024-01-31 02:22:40.932871 ->  1st line: 'config hosts.cfg'
>> 2024-01-31 02:22:40.932876 Cannot load hosts.cfg from xymond: Select(2)
>> failed
>> 2024-01-31 02:22:40.932881 Failed to load from xymond, reverting to
>> file-load
>>
>> And notifications.log is zero bytes in size.
>>
>> I added "--debug" to the "[alert]" section of /etc/xymon/tasks.cfg and
>> while the verbosity was increased, there was no indication of why alerts
>> are not being sent.
>>
>> Any clues how I can debug this?
>>
>> Thanks
>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20240131/ccdb9ae6/attachment.htm>


More information about the Xymon mailing list