[Xymon] Xymon 4.3.30-1 memory issues and core dumps

Jeremy Laidman jeremy at laidman.org
Tue Dec 15 01:03:48 CET 2020


Hi Carl

On Tue, 15 Dec 2020 at 01:53, Carl Melgaard <Carl.Melgaard at stab.rm.dk>
wrote:

> Hi,
>
>
>
> After running for 5 hrs on my new installation on a RH 7.9, xymond has
> already allocated 11.5GB of memory…
>

xymond using a lot of RAM could be something different from the core dumps.
But I suspect they're related. For instance, if it's having to keep lots of
large combo messages in RAM while other modules send or receive them, but
the other modules keep crashing. It's not clear if the xymonnet and
xymongen crashes are causing the high RAM usage, or the other way around.
It might be worth checking log timestamps to work out what happened first.

Last night it core-dumped multiple times, and threw “Cannot allocate
> memory” in multiple xymon logfiles, ala “newstrbuffer: Attempt to allocate
> failed (initialsize=1009956863): Cannot allocate memory”.
>

"Cannot allocate memory" - do you have swap space? Is it being used?


> Monitoring 1900 hosts currently – on my primary system I do this with only
> 4 GB of memory with no issues.
>

What version of Xymon are you running on the primary system? Similar OS?


> Any idea where I should start to look – it’s a terabithia installation.
>
>
>
> Heres a couple of the core-dumps gdb’ed:
>

The two core dumps suggest the same cause.

> #2  0x0000561f05bf6115 in sigsegv_handler (signum=<optimized out>) at
sig.c:57

> #3  <signal handler called>


The sigsegv handler was called, which probably means there was a memory
segment violation - typically using memory that hasn't been allocated.

I'm not a C programmer, but I'm guessing from this:

#4  strbuf_addtobuffer (buf=0x0, newtext=0x561f0701db60 "extcombo", ' '
> <repeats 192 times>..., newlen=2000) at strfunc.c:115
>

that the code responsible is (in strfunc.c):

void strbuf_addtobuffer(strbuffer_t *buf, char *newtext, size_t newlen)
{
        if (buf->s == NULL) {
                buf->used = 0;
                buf->sz = newlen + BUFSZINCREMENT;
                buf->s = (char *) malloc(buf->sz);
                *(buf->s) = '\0';
        }

The "malloc()" operation may have failed due to running out of memory. Then
the next line tries to store a "0" byte into unallocated RAM. I'd guess
this would cause a sigsegv.

In other parts of the same file, malloc() is followed by a check for
failure, before the memory is used:

For instance:

        newbuf->s = (char *)malloc(initialsize);
        if (newbuf->s == NULL) {
                errprintf("newstrbuffer: Attempt to allocate failed
(initialsize=%d): %s\n", initialsize, strerror(errno));
                xfree(newbuf);
                return NULL;
        }
        *(newbuf->s) = '\0';

The above error checking has been added to some of the code, but perhaps
there are places it still needs to be added.

This appears to have happened during the addition of a combo message string
to allocated memory, while creating a message to send to xymond (sendmsg.c):

#5  0x0000561f05bf79b5 in addtobufferraw (buf=<optimized out>,
> newdata=<optimized out>, bytes=<optimized out>) at strfunc.c:184
>
> #6  0x0000561f05c00d32 in combo_start () at sendmsg.c:908
>

Combo messages can be large, and this could a) cause increased RAM usage,
or b) be affected by it. Again, it's not clear if the behaviour of xymongen
and xymonnet are the cause of your problems or the result of them.

It looks like the call to combo_start() in xymongen is in code that runs as
a result of the "--report" switch. In xymonnet, it's in common code, but
there seems to be a modified code path available if you were to add "--bfq"
(backfeed queue). I know nothing about the backfeed queue feature, but
there's a little about it in the README.backfeed file.

So some things to consider, mostly just work-arounds and troubleshooting:

1. Make sure you have swap enabled, and monitor swap-in/swap-out.
2. See if anything else is using excessive RAM.
3. Play with combo message sizes. Perhaps a smaller size would help. You
can set MAXMSGSPERCOMBO in xymonserver.cfg.
4. Run an older version of Xymon on your new installation, perhaps the same
as your current installation. Or perhaps just copy the binaries for
xymonnet and/or xymongen to the new server?
5. Patch the strfunc.c file to include the malloc error checking. You'd
need to get the SRPM from Terabithia and build it yourself. Only the
xymonnet and xymongen binaries would need to be replaced.
6. Profile the xymond process's memory usage. I'm not sure how to do this.
Perhaps you can get it to dump core, then analyse the core (perhaps just
run "strings" over it) to see what's using up all the memory. Perhaps
there's some gdb techniques for this.
7. Try running xymongen without "--report", and xymonnet with "--bfq" or
"--no-bfq".

Hope that helps.

Cheers
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20201215/941a5d69/attachment.htm>


More information about the Xymon mailing list