[Xymon] Xymon no longer sending alerts
Jeremy Laidman
jeremy at laidman.org
Wed Jan 31 03:38:33 CET 2024
Great news Colin.
Have the "Select(2)" messages gone away?
Can you share the nature of the error in alerts.cfg, so I know what to look
for when I do the same in future?
J
On Wed, 31 Jan 2024 at 13:22, Colin Coe <colin.coe at gmail.com> wrote:
> Hi all
>
> This is resolved. It was a stupid error in alerts.cfg...
>
> Thanks for the suggestions
>
> On Wed, 31 Jan 2024 at 09:21, Colin Coe <colin.coe at gmail.com> wrote:
>
>> Hi Jeremy
>>
>> Running the following gives me the expected result so the server is
>> responding, at least sometimes.
>> xymon 127.0.0.1 "config hosts.cfg"
>>
>> Is this a worry: "Discarding timed-out partial msg from 127.0.0.1"?
>> Getting lots of these...
>>
>> I've added --trace to xymond_alerts and will go through that.
>>
>> Thanks
>>
>>
>>
>> On Wed, 31 Jan 2024 at 08:32, Jeremy Laidman <jeremy at laidman.org> wrote:
>>
>>> Hi Colin
>>>
>>> From the logs, it appears that xymond_alert is unable to communicate
>>> with your Xymon server on 10.10.10.10:1984. It seems to be trying to
>>> fetch the hosts.cfg file contents via the BB protocol by sending a "config
>>> hosts.cfg" command to xymond, but xymond is not responding.
>>>
>>> The select() system call is monitoring a file handle or socket for
>>> activity, likely the TCP socket with 10.10.10.10:1984. The timeout
>>> means that the select() call didn't return a response in the expected time.
>>> This suggests that the TCP connection was established correctly (xymond is
>>> listening and IP/port are likely correct) and xymond_alert sent the request
>>> for the hosts.cfg file, but there was no response.
>>>
>>> It might be worth checking xymond.log for messages corresponding to the
>>> timestamps of the errors from xymond_alert.
>>>
>>> I'm not convinced this is the reason that you're not getting alert
>>> emails. If xymond_alert can't get hosts.cfg from a BB message, it should be
>>> able to get it directly from the filesystem, and then carry on. So the
>>> messages you're seeing might be a red herring, although I wouldn't
>>> expect them to show up on a normally operating Xymon installation. Having
>>> said that, my Xymon installation is showing those log messages, yet I've no
>>> reason to think that our alerting is broken, so perhaps it's just something
>>> that can be ignored.
>>>
>>> It might be worth taking a look at the man page for xymond_alert, and
>>> have a go at the --test, --trace and --dump-config options.
>>>
>>> In case it's not obvious, I'm really not sure what the problem could be,
>>> and I'm just throwing out some ideas in case something helps.
>>>
>>> J
>>>
>>> On Wed, 31 Jan 2024 at 10:50, Colin Coe <colin.coe at gmail.com> wrote:
>>>
>>>> Hi all
>>>>
>>>> Our Xymon server has recently stopped sending alert emails. This server
>>>> is also running Postfix and is our mail relay.
>>>>
>>>> From alert.log all I see is:
>>>> 2024-01-31 02:17:39.813610 Whoops ! Failed to send message (Select(2)
>>>> failed)
>>>> 2024-01-31 02:17:39.829027 -> Select failure while sending to Xymon
>>>> daemon at 10.10.10.10:1984
>>>> 2024-01-31 02:17:39.829032 -> Recipient '10.10.10.10', timeout 50
>>>> 2024-01-31 02:17:39.829037 -> 1st line: 'config hosts.cfg'
>>>> 2024-01-31 02:17:39.829042 Cannot load hosts.cfg from xymond: Select(2)
>>>> failed
>>>> 2024-01-31 02:17:39.829049 Failed to load from xymond, reverting to
>>>> file-load
>>>> 2024-01-31 02:22:40.932828 Whoops ! Failed to send message (Select(2)
>>>> failed)
>>>> 2024-01-31 02:22:40.932863 -> Select failure while sending to Xymon
>>>> daemon at 10.10.10.10:1984
>>>> 2024-01-31 02:22:40.932867 -> Recipient '10.10.10.10', timeout 50
>>>> 2024-01-31 02:22:40.932871 -> 1st line: 'config hosts.cfg'
>>>> 2024-01-31 02:22:40.932876 Cannot load hosts.cfg from xymond: Select(2)
>>>> failed
>>>> 2024-01-31 02:22:40.932881 Failed to load from xymond, reverting to
>>>> file-load
>>>>
>>>> And notifications.log is zero bytes in size.
>>>>
>>>> I added "--debug" to the "[alert]" section of /etc/xymon/tasks.cfg and
>>>> while the verbosity was increased, there was no indication of why alerts
>>>> are not being sent.
>>>>
>>>> Any clues how I can debug this?
>>>>
>>>> Thanks
>>>> _______________________________________________
>>>> Xymon mailing list
>>>> Xymon at xymon.com
>>>> http://lists.xymon.com/mailman/listinfo/xymon
>>>>
>>> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20240131/952d9d89/attachment.htm>
More information about the Xymon
mailing list