[Xymon] xymond crashing

Jeremy Laidman jeremy at laidman.org
Wed Mar 23 00:00:17 CET 2022


Assigning a value to a variable seems quite benign. Perhaps the memory for
seltmo hasn't been allocated?

Have you taken a look at the xymond.log file around the time of the
coredumps?

The xymond process has quite a bit of debugging info available, if run with
the "--debug" option.

But, yes, an upgrade might be your simplest solution. The code in the
xymond.c file appears to have been greatly re-written since then. In
v4.3.5, the "redback" library called using rbreadlist() was completely
replaced by Henrik's own tree management code. The Changelog file for 4.3.5
shows some info about xymond crashing in 4.3.4:

* Fix crashes in xymond caused by faulty new library for
  storing cookies and host-information.
* Fix memory corruption/crash in xymond caused by logging
  of multi-source statuses.

(I think the "cookies" mentioned here are "ack" cookies used to tie an ack
to an event) The "storing ... host-information" possibly supports the
correlation with the use of include files for hosts.cfg.

So even an upgrade to 4.3.5 might fix your problem.

Cheers
Jeremy

On Tue, 22 Mar 2022 at 20:00, Neil Simmonds <neilsimmonds1808 at gmail.com>
wrote:

> Hi Jeremy,
>
> We're using Xymon 4.3.4 (Yes, I know, there are plans to build a new one )
> and we built it from source.
>
> It's been running fine for years but we've been getting these CoreDumps
> now for a few months.
>
> I've looked at the code and this is the section with line 5024 (i've made
> line 5024 itself bold)
>
> /*
> * Do the select() with a static 2 second timeout.
> * This is long enough that we will suspend activity for
> * some time if there's nothing to do, but short enough for
> * us to attend to the housekeeping stuff without undue delay.
> */
> *seltmo.tv_sec = 2; seltmo.tv_usec = 0;*
> n = select(maxfd+1, &fdread, &fdwrite, NULL, &seltmo);
> if (n <= 0) {
> if ((errno == EINTR) || (n == 0)) {
> /* Interrupted or a timeout happened */
> continue;
> }
> else {
> errprintf("Fatal error in select: %s\n", strerror(errno));
> break;
> }
> }
>
> IThe only thing I can think that might be an influence is that around the
> time that the coredumps started, we started using include files with the
> hosts.cfg file.
>
> Kind regards,
> Neil.
>
> On Mon, Mar 21, 2022 at 10:15 PM Jeremy Laidman <jeremy at laidman.org>
> wrote:
>
>> Hi Neil
>>
>> I don't know if I can help you much to diagnose. Can you share the
>> version of Xymon that you're using, and whether you're building from source
>> or installing a package? I suppose the place to start would be to see
>> what's happening in xymond.c at line 5024.
>>
>> J
>>
>> On Sat, 19 Mar 2022 at 00:36, Neil Simmonds <neilsimmonds1808 at gmail.com>
>> wrote:
>>
>>> Hi all, We're getting a weird issue where the xymond process is crashing
>>> every couple of days. If you're actually viewing Xymon at the time is only
>>> seems to affect the critical page and within 10 minutes it restarts and all
>>> is back to normal but we get a core dump. Has anyone ever seen an issue
>>> like this? I've gone through our critical.cfg file and given it a visual
>>> once over but it's a bit difficult checking a file with 7638 lines.
>>>
>>> The dump file doesn't give us much help as far as I can see
>>>
>>> Core was generated by `xymond --pidfile=/var/log/xymon/xymond.pid
>>> --restart=/xymon/server/tmp/xymond.c'.
>>>
>>> Program terminated with signal 6, Aborted.
>>>
>>> #0  0x0000003c0ce30265 in raise () from /lib64/libc.so.6
>>>
>>> #0  0x0000003c0ce30265 in raise () from /lib64/libc.so.6
>>>
>>> #1  0x0000003c0ce31d10 in abort () from /lib64/libc.so.6
>>>
>>> #2  0x0000000000419353 in sigsegv_handler (signum=<value optimized out>)
>>> at sig.c:57
>>>
>>> #3  <signal handler called>
>>>
>>> #4  0x000000000041efd2 in rb_successor ()
>>>
>>> #5  0x000000000041f5f2 in rb_readlist ()
>>>
>>> #6  0x000000000041e7c7 in rbreadlist ()
>>>
>>> #7  0x000000000040ece2 in main (argc=1647522333, argv=<value optimized
>>> out>) at xymond.c:5024
>>> _______________________________________________
>>> Xymon mailing list
>>> Xymon at xymon.com
>>> http://lists.xymon.com/mailman/listinfo/xymon
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20220323/91eae9a4/attachment.htm>


More information about the Xymon mailing list