[Xymon] xymongen crashes in 4.3.29

Japheth Cleaver cleaver at terabithia.org
Tue Aug 27 22:15:01 CEST 2019


Alright, I believe I have fixed the issue here... Multiple issues within 
the availability code (fixed in 
https://sourceforge.net/p/xymon/code/8081/), and then a typo in a 
Terabithia patch. Please try out 4.3.30-0.6 in the /testing/ repo if 
possible.

You can also manually perform a run of this by executing: `xymoncmd 
xymonreports.sh daily` as the xymon user. That should give a 
reproduceable crash in 4.3.30-0.5 and be clean in 4.3.30-0.6.

I've also built the EL7 packages on a CentOS 7 box, which should provide 
proper compatibility while we're in a mixed 7.6/7.7 state in the ecosystem.

Regards,
-jc


On 8/23/2019 9:00 AM, Matt Vander Werf wrote:

> Hi JC,
>
> Unfortunately, this didn't seem to fix the crashes. Today I got the 
> e-mail at 1:05 AM, though the core file has a timestamp of 1:04 AM. 
> This time frame still matches up with it being the dailyreport task 
> that is triggering the crashes (since there are no crashes any other 
> time of the day).
>
> [root@<xymon server> ~]# xymoncmd --version
> Xymon version 4.3.30-0.5.el7.terabithia
> [root@<xymon server> ~]# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.7 (Maipo)
>
> The latest core backtrace looks to be the same as previously (same 
> client and service too), but I'm including it here [1] just for 
> completeness.
>
> Let me know if there's anything else I can provide to debug this.
>
> Thanks!
>
>
> [1]
> [root@<xymon server> ~]# gdb -q /usr/libexec/xymon/xymongen core.1312
> Reading symbols from /usr/libexec/xymon/xymongen...Reading symbols 
> from /usr/lib/debug/usr/libexec/xymon/xymongen.debug...done.
> done.
> [New LWP 1312]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/libexec/xymon/xymongen 
> --reportopts=1566446400:1566532799:0:nongr --recent'.
> Program terminated with signal 6, Aborted.
> #0  0x00007fc746f7e377 in __GI_raise (sig=sig at entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> 55  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> (gdb) bt
> #0  0x00007fc746f7e377 in __GI_raise (sig=sig at entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> #1  0x00007fc746f7fa68 in __GI_abort () at abort.c:90
> #2  0x00005559fc1ce4f5 in sigsegv_handler (signum=<optimized out>) at 
> sig.c:57
> #3  <signal handler called>
> #4  strchrnul () at ../sysdeps/x86_64/strchrnul.S:33
> #5  0x00007fc746f90681 in __find_specmb (format=0xfce <Address 0xfce 
> out of bounds>) at printf-parse.h:109
> #6  _IO_vfprintf_internal (s=s at entry=0x7ffeb3e6d340,
>     format=format at entry=0xfce <Address 0xfce out of bounds>, 
> ap=ap at entry=0x7ffeb3e6d478) at vfprintf.c:1308
> #7  0x00007fc74705dc78 in ___vsprintf_chk (s=0x7ffeb3e6d6c2 "", 
> flags=1, slen=18446744073709551615,
>     format=0xfce <Address 0xfce out of bounds>, 
> args=args at entry=0x7ffeb3e6d478) at vsprintf_chk.c:83
> #8  0x00007fc74705dbcd in ___sprintf_chk (s=<optimized out>, 
> flags=flags at entry=1,
>     slen=slen at entry=18446744073709551615, format=<optimized out>) at 
> sprintf_chk.c:32
> #9  0x00005559fc1bf96a in sprintf (__fmt=<optimized out>, 
> __s=<optimized out>)
>     at /usr/include/bits/stdio2.h:33
> #10 parse_histlogfile (starttime=1566446400,
>     timespec=0x5559fc431f50 <timespec.7157> "Wed_Sep_2_19:34:55_2015", 
> servicename=0x5559fd61b2d0 "procs",
>     hostname=0x5559fdc94520 "<client hostname>") at availability.c:174
> #11 parse_historyfile (fd=fd at entry=0x5559fdc9be00, repinfo=<optimized 
> out>,
>     hostname=0x5559fdc94520 "<client hostname>", 
> servicename=0x5559fd61b2d0 "procs",
>     fromtime=<optimized out>, totime=1566532799, 
> for_history=for_history at entry=0, warnlevel=97,
>     greenlevel=99.995000000000005, warnstops=-1, reporttime=0x0) at 
> availability.c:475
> #12 0x00005559fc1b496c in init_state (filename=<optimized out>,
>     filename at entry=0x7ffeb3e7f950 "<client hostname>.procs", 
> log=log at entry=0x7ffeb3e7f860)
>     at loaddata.c:275
> #13 0x00005559fc1b568e in load_state 
> (sumhead=sumhead at entry=0x5559fc3fad48 <dispsums>) at loaddata.c:626
> #14 0x00005559fc1af794 in main (argc=5, argv=0x7ffeb3e84b58) at 
> xymongen.c:599
>
>
> --
> Matt Vander Werf
>
>
> On Thu, Aug 22, 2019 at 5:29 PM Matt Vander Werf <matt1299 at gmail.com 
> <mailto:matt1299 at gmail.com>> wrote:
>
>     Hi JC,
>
>     Ah ha! That is one place I did not look and the timing certainly
>     matches up!
>
>     I have installed that new version on my Xymon server (running
>     actual RHEL 7) and we'll see how it fares tomorrow morning....
>
>     Thanks!
>
>     --
>     Matt Vander Werf
>
>
>     On Thu, Aug 22, 2019 at 5:12 PM Japheth Cleaver
>     <cleaver at terabithia.org <mailto:cleaver at terabithia.org>> wrote:
>
>         Hi,
>
>         I think this might be xymongen in report mode from the
>         "dailyreport" file in /tasks.d/; the timing would check out. 
>         I believe the problem here is one of the Terabithia patches
>         now doing the wrong thing after some of the string-handling
>         changes in 4.3.29 -- causing core dumps in certain situations.
>
>         If you're running actual RHEL7 on this (not CentOS, which
>         hasn't released 7.7 yet), would you mind checking the
>         xymon-4.3.30-0.5 package in the EL7 Terabithia testing repo
>         and see if this helps?
>         https://repo.terabithia.org/rpms/xymon/testing/el7/x86_64/
>
>         Regards,
>         -jc
>
>         On 8/22/2019 11:34 AM, Matt Vander Werf wrote:
>>         Hi Torsten,
>>
>>         No, there wasn't anything running from cron or anything else
>>         around that time, let alone anything that restarts the
>>         network or Xymon.
>>
>>         Thanks.
>>
>>         --
>>         Matt Vander Werf
>>
>>
>>         On Wed, Aug 21, 2019 at 5:43 AM Torsten Richter
>>         <bb4 at richter-it.net <mailto:bb4 at richter-it.net>> wrote:
>>
>>             Hi Matt,
>>
>>             dumb question: is there any cron job running at this time
>>             that is restarting XYmon fiddling with the network, like
>>             restarting the network for some reason?
>>
>>             Regards,
>>             Torsten
>>
>>>             Matt Vander Werf <matt1299 at gmail.com
>>>             <mailto:matt1299 at gmail.com>> hat am 20. August 2019 um
>>>             17:10 geschrieben:
>>>
>>>             Hi all,
>>>
>>>             Every day since we updated our Xymon server to 4.3.29
>>>             (from 4.3.28), I've gotten an e-mail alert due to xymond
>>>             turning red that reads:
>>>
>>>             red xymongen program crashed
>>>
>>>             Fatal signal caught!
>>>
>>>             The strange thing is that this has happened at 1:04 AM
>>>             every day...like clockwork. I have xymongen set to run
>>>             every 1 minute and it has no problems running any other
>>>             time of the day. We are using the Terabithia RPMs and
>>>             the Xymon server is running RHEL 7.
>>>
>>>             I've scoured the system to find anything that is set to
>>>             run at/around that time via cron, etc. and haven't found
>>>             anything. The system logs don't show anything is
>>>             happening around that time either.
>>>
>>>             I turned on debug logging for xymond and xymongen and
>>>             haven't been able to find anything unusual in either
>>>             logs around that time. But it is dumping core files for
>>>             xymongen every time it crashes.
>>>
>>>             I used gdb to get the backtrace on all of the core files
>>>             (so far) and I've found that they all show the same
>>>             thing. It shows the same host in the backtrace too
>>>             (although I'm farily confident it isn't specific or
>>>             isolated to that host but just the first one it runs
>>>             into that it has issues with when processing).
>>>
>>>             I've included an example gdb output below (the most
>>>             recent one) [1].
>>>
>>>             Is anyone else running into this by chance? Or any idea
>>>             what might be the cause?
>>>
>>>             Thanks!
>>>
>>>
>>>             [1]
>>>             # gdb -q /usr/libexec/xymon/xymongen core.16327
>>>             Reading symbols from
>>>             /usr/libexec/xymon/xymongen...Reading symbols from
>>>             /usr/lib/debug/usr/libexec/xymon/xymongen.debug...done.
>>>             done.
>>>             [New LWP 16327]
>>>             [Thread debugging using libthread_db enabled]
>>>             Using host libthread_db library "/lib64/libthread_db.so.1".
>>>             Core was generated by `/usr/libexec/xymon/xymongen
>>>             --reportopts=1566187200:1566273599:0:nongr --recent'.
>>>             Program terminated with signal 6, Aborted.
>>>             #0  0x00007f4657c49377 in __GI_raise (sig=sig at entry=6)
>>>             at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
>>>             55  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>>>             (gdb) bt
>>>             #0  0x00007f4657c49377 in __GI_raise (sig=sig at entry=6)
>>>             at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
>>>             #1  0x00007f4657c4aa68 in __GI_abort () at abort.c:90
>>>             #2  0x00005589375dd455 in sigsegv_handler
>>>             (signum=<optimized out>) at sig.c:57
>>>             #3  <signal handler called>
>>>             #4  strchrnul () at ../sysdeps/x86_64/strchrnul.S:33
>>>             #5  0x00007f4657c5b681 in __find_specmb (format=0xfce
>>>             <Address 0xfce out of bounds>) at printf-parse.h:109
>>>             #6  _IO_vfprintf_internal (s=s at entry=0x7ffd5dabcc00,
>>>                 format=format at entry=0xfce <Address 0xfce out of
>>>             bounds>, ap=ap at entry=0x7ffd5dabcd38) at vfprintf.c:1308
>>>             #7  0x00007f4657d28c78 in ___vsprintf_chk
>>>             (s=0x7ffd5dabcf82 "", flags=1, slen=18446744073709551615,
>>>                 format=0xfce <Address 0xfce out of bounds>,
>>>             args=args at entry=0x7ffd5dabcd38) at vsprintf_chk.c:83
>>>             #8  0x00007f4657d28bcd in ___sprintf_chk (s=<optimized
>>>             out>, flags=flags at entry=1,
>>>             slen=slen at entry=18446744073709551615, format=<optimized
>>>             out>) at sprintf_chk.c:32
>>>             #9  0x00005589375ce8ca in sprintf (__fmt=<optimized
>>>             out>, __s=<optimized out>)
>>>                 at /usr/include/bits/stdio2.h:33
>>>             #10 parse_histlogfile (starttime=1566187200,
>>>                 timespec=0x558937840f50 <timespec.7157>
>>>             "Wed_Sep_2_19:34:55_2015", servicename=0x5589383b6d70
>>>             "procs",
>>>                 hostname=0x558938a335d0 "<client hostname>") at
>>>             availability.c:174
>>>             #11 parse_historyfile (fd=fd at entry=0x558938a3aea0,
>>>             repinfo=<optimized out>,
>>>                 hostname=0x558938a335d0 "<client hostname>",
>>>             servicename=0x5589383b6d70 "procs",
>>>                 fromtime=<optimized out>, totime=1566273599,
>>>             for_history=for_history at entry=0, warnlevel=97,
>>>                 greenlevel=99.995000000000005, warnstops=-1,
>>>             reporttime=0x0) at availability.c:475
>>>             #12 0x00005589375c38cc in init_state
>>>             (filename=<optimized out>,
>>>                 filename at entry=0x7ffd5dacf210 "<client
>>>             hostname>.procs", log=log at entry=0x7ffd5dacf120)
>>>                 at loaddata.c:275
>>>             #13 0x00005589375c45ee in load_state
>>>             (sumhead=sumhead at entry=0x558937809d48 <dispsums>) at
>>>             loaddata.c:626
>>>             #14 0x00005589375be6f4 in main (argc=5,
>>>             argv=0x7ffd5dad4418) at xymongen.c:599
>>>
>>>
>>>             -- 
>>>             Matt Vander Werf
>>>             _______________________________________________
>>>             Xymon mailing list
>>>             Xymon at xymon.com <mailto:Xymon at xymon.com>
>>>             http://lists.xymon.com/mailman/listinfo/xymon
>>
>>
>>         _______________________________________________
>>         Xymon mailing list
>>         Xymon at xymon.com  <mailto:Xymon at xymon.com>
>>         http://lists.xymon.com/mailman/listinfo/xymon
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20190827/13a7afcc/attachment.htm>


More information about the Xymon mailing list