[Xymon] Xymon no longer sending alerts

Colin Coe colin.coe at gmail.com
Wed Jan 31 03:43:13 CET 2024


The "Select" messages are still there.

The faulty alerts.cfg config was:
HOST=%(...|GS\d\d\d\d)
    IGNORE

but should have been:
HOST=%(...GS\d\d\d\d)
    IGNORE

so basically everything was being ignored


On Wed, 31 Jan 2024 at 10:39, Jeremy Laidman <jeremy at laidman.org> wrote:

> Great news Colin.
>
> Have the "Select(2)" messages gone away?
>
> Can you share the nature of the error in alerts.cfg, so I know what to
> look for when I do the same in future?
>
> J
>
> On Wed, 31 Jan 2024 at 13:22, Colin Coe <colin.coe at gmail.com> wrote:
>
>> Hi all
>>
>> This is resolved. It was a stupid error in alerts.cfg...
>>
>> Thanks for the suggestions
>>
>> On Wed, 31 Jan 2024 at 09:21, Colin Coe <colin.coe at gmail.com> wrote:
>>
>>> Hi Jeremy
>>>
>>> Running the following gives me the expected result so the server is
>>> responding, at least sometimes.
>>> xymon 127.0.0.1 "config hosts.cfg"
>>>
>>> Is this a worry: "Discarding timed-out partial msg from 127.0.0.1"?
>>> Getting lots of these...
>>>
>>> I've added --trace to xymond_alerts and will go through that.
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Wed, 31 Jan 2024 at 08:32, Jeremy Laidman <jeremy at laidman.org> wrote:
>>>
>>>> Hi Colin
>>>>
>>>> From the logs, it appears that xymond_alert is unable to communicate
>>>> with your Xymon server on 10.10.10.10:1984. It seems to be trying to
>>>> fetch the hosts.cfg file contents via the BB protocol by sending a "config
>>>> hosts.cfg" command to xymond, but xymond is not responding.
>>>>
>>>> The select() system call is monitoring a file handle or socket for
>>>> activity, likely the TCP socket with 10.10.10.10:1984. The timeout
>>>> means that the select() call didn't return a response in the expected time.
>>>> This suggests that the TCP connection was established correctly (xymond is
>>>> listening and IP/port are likely correct) and xymond_alert sent the request
>>>> for the hosts.cfg file, but there was no response.
>>>>
>>>> It might be worth checking xymond.log for messages corresponding to the
>>>> timestamps of the errors from xymond_alert.
>>>>
>>>> I'm not convinced this is the reason that you're not getting alert
>>>> emails. If xymond_alert can't get hosts.cfg from a BB message, it should be
>>>> able to get it directly from the filesystem, and then carry on. So the
>>>> messages you're seeing might be a red herring, although I wouldn't
>>>> expect them to show up on a normally operating Xymon installation. Having
>>>> said that, my Xymon installation is showing those log messages, yet I've no
>>>> reason to think that our alerting is broken, so perhaps it's just something
>>>> that can be ignored.
>>>>
>>>> It might be worth taking a look at the man page for xymond_alert, and
>>>> have a go at the --test, --trace and --dump-config options.
>>>>
>>>> In case it's not obvious, I'm really not sure what the problem could
>>>> be, and I'm just throwing out some ideas in case something helps.
>>>>
>>>> J
>>>>
>>>> On Wed, 31 Jan 2024 at 10:50, Colin Coe <colin.coe at gmail.com> wrote:
>>>>
>>>>> Hi all
>>>>>
>>>>> Our Xymon server has recently stopped sending alert emails. This
>>>>> server is also running Postfix and is our mail relay.
>>>>>
>>>>> From alert.log all I see is:
>>>>> 2024-01-31 02:17:39.813610 Whoops ! Failed to send message (Select(2)
>>>>> failed)
>>>>> 2024-01-31 02:17:39.829027 ->  Select failure while sending to Xymon
>>>>> daemon at 10.10.10.10:1984
>>>>> 2024-01-31 02:17:39.829032 ->  Recipient '10.10.10.10', timeout 50
>>>>> 2024-01-31 02:17:39.829037 ->  1st line: 'config hosts.cfg'
>>>>> 2024-01-31 02:17:39.829042 Cannot load hosts.cfg from xymond:
>>>>> Select(2) failed
>>>>> 2024-01-31 02:17:39.829049 Failed to load from xymond, reverting to
>>>>> file-load
>>>>> 2024-01-31 02:22:40.932828 Whoops ! Failed to send message (Select(2)
>>>>> failed)
>>>>> 2024-01-31 02:22:40.932863 ->  Select failure while sending to Xymon
>>>>> daemon at 10.10.10.10:1984
>>>>> 2024-01-31 02:22:40.932867 ->  Recipient '10.10.10.10', timeout 50
>>>>> 2024-01-31 02:22:40.932871 ->  1st line: 'config hosts.cfg'
>>>>> 2024-01-31 02:22:40.932876 Cannot load hosts.cfg from xymond:
>>>>> Select(2) failed
>>>>> 2024-01-31 02:22:40.932881 Failed to load from xymond, reverting to
>>>>> file-load
>>>>>
>>>>> And notifications.log is zero bytes in size.
>>>>>
>>>>> I added "--debug" to the "[alert]" section of /etc/xymon/tasks.cfg and
>>>>> while the verbosity was increased, there was no indication of why alerts
>>>>> are not being sent.
>>>>>
>>>>> Any clues how I can debug this?
>>>>>
>>>>> Thanks
>>>>> _______________________________________________
>>>>> Xymon mailing list
>>>>> Xymon at xymon.com
>>>>> http://lists.xymon.com/mailman/listinfo/xymon
>>>>>
>>>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20240131/70dadbd7/attachment.htm>


More information about the Xymon mailing list