4.3.30 core dumps with directory directive in hosts.cfg

Beck, Zak zak.beck at accenture.com
Wed Nov 3 15:25:53 CET 2021


Hi

I’m using 4.3.30 and I’m experiencing core dumps on some servers using the directory directive in hosts.cfg. The contents of my directory sometimes change and this sometimes leads to a core dump.

In hosts.cfg I have:

directory /home/xymon/server/etc/include_xxxxx/xxxxx/

this directory contains one file per host which contain something like this:
0.0.0.0   a_unique_name   # noconn NOCOLUMNS:conn,info,trends nonongreen

A cron job adds/removes files from this directory periodically, however, the directory should contain at least one file at all times.

It’s been a long time since I did any C but I think I’ve been able to track this down a bit with gdb and --debug on xymond. Here’s some xymond debug output:

5315 2021-11-03 12:08:45.037113 File /home/xymon/server/etc/include_xxxxx/xxxxx//xxxxxxxxx new timestamp
16223 2021-11-03 12:08:45.037136 -> save_checkpoint
5315 2021-11-03 12:08:45.037218 Opening file /home/xymon/server/etc/hosts.cfg
5315 2021-11-03 12:08:45.040573 Opening file /home/xymon/server/etc/include_xxxxx/xxxxx//xxxxxxxxxx
.. (opens all the files)
5315 2021-11-03 12:08:45.047109 -> handle_dropnrename
5315 2021-11-03 12:08:45.047128 -> posttochannel
5315 2021-11-03 12:08:45.047144 Posting message 13651 to 1 readers
5315 2021-11-03 12:08:45.047155 <- posttochannel
5315 2021-11-03 12:08:45.047159 -> posttochannel
5315 2021-11-03 12:08:45.047168 Posting message 4683 to 1 readers
5315 2021-11-03 12:08:45.047176 <- posttochannel
5315 2021-11-03 12:08:45.047179 -> posttochannel
5315 2021-11-03 12:08:45.047188 Posting message 1367 to 1 readers
5315 2021-11-03 12:08:45.047195 <- posttochannel
5315 2021-11-03 12:08:45.047199 -> posttochannel
5315 2021-11-03 12:08:45.047207 Posting message 1905 to 1 readers
5315 2021-11-03 12:08:45.047214 <- posttochannel
5315 2021-11-03 12:08:45.047218 -> posttochannel
5315 2021-11-03 12:08:45.047221 Dropping message - no readers
5315 2021-11-03 12:08:45.047225 -> posttochannel
5315 2021-11-03 12:08:45.047228 Dropping message - no readers
5315 2021-11-03 12:08:45.047232 -> posttochannel
5315 2021-11-03 12:08:45.047240 Posting message 785 to 1 readers
5315 2021-11-03 12:08:45.047247 <- posttochannel
5315 2021-11-03 12:08:45.047251 -> posttochannel
5315 2021-11-03 12:08:45.047259 Posting message 6 to 1 readers
5315 2021-11-03 12:08:45.047266 <- posttochannel
5315 2021-11-03 12:08:45.047269 -> posttochannel
5315 2021-11-03 12:08:45.047273 Dropping message - no readers
5315 2021-11-03 12:08:45.047287 -> free_log_t
5315 2021-11-03 12:08:45.047300 <- free_log_t
5315 2021-11-03 12:08:45.047308 <- handle_dropnrename
5315 2021-11-03 12:08:45.047312 -> handle_dropnrename
16223 2021-11-03 12:08:45.240284 <- save_checkpoint
2021-11-03 12:09:00.061085 Whoops ! Failed to send message (timeout)
2021-11-03 12:09:00.061303 ->

On the last handle_dropnrename, the core dump occurs:

Program terminated with signal 6, Aborted.
#0  0x00007f39f751e387 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f39f751e387 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f39f751fa78 in __GI_abort () at abort.c:90
#2  0x0000000000415f63 in sigsegv_handler (signum=<optimized out>) at sig.c:57
#3  <signal handler called>
#4  __strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:38
#5  0x000000000040a5ac in handle_dropnrename (cmd=cmd at entry=CMD_DROPSTATE, sender=sender at entry=0x425e3c "xymond", hostname=0x0, n1=n1 at entry=0x0, n2=n2 at entry=0x0) at xymond.c:2511
#6  0x0000000000403eb1 in main (argc=<optimized out>, argv=<optimized out>) at xymond.c:5702

What is interesting is that handle_dropnrename appears to be passed an empty hostname or null pointer if I understand gdb correctly.

In handle_dropnrename strlen is called on hostname and I think this causes the core dump? (xymond.c line 2511):

                                char *msgbuf = (char *)malloc(20 + strlen(hostname) + (n1 ? strlen(n1) : 0) + (n2 ? strlen(n2) : 0))

I _think_ handle_dropnrename is being called around line 5700 of xymond.c:

                                                                                else if (hostinfo(hwalk->hostname) == NULL) {
                                                                                                /* Remove all state info about this host. This will NOT remove files. */
                                                                                                handle_dropnrename(CMD_DROPSTATE, "xymond", hwalk->hostname, NULL, NULL);


What am I doing wrong 😊

Thanks

Zak


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. Your privacy is important to us. Accenture uses your personal data only in compliance with data protection laws. For further information on how Accenture processes your personal data, please see our privacy statement at https://www.accenture.com/us-en/privacy-policy.
______________________________________________________________________________________

www.accenture.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20211103/3204f6b8/attachment.htm>


More information about the Xymon mailing list