[Xymon] xymongen crashes in 4.3.29
Matt Vander Werf
matt1299 at gmail.com
Fri Aug 23 18:00:31 CEST 2019
Hi JC,
Unfortunately, this didn't seem to fix the crashes. Today I got the e-mail
at 1:05 AM, though the core file has a timestamp of 1:04 AM. This time
frame still matches up with it being the dailyreport task that is
triggering the crashes (since there are no crashes any other time of the
day).
[root@<xymon server> ~]# xymoncmd --version
Xymon version 4.3.30-0.5.el7.terabithia
[root@<xymon server> ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)
The latest core backtrace looks to be the same as previously (same client
and service too), but I'm including it here [1] just for completeness.
Let me know if there's anything else I can provide to debug this.
Thanks!
[1]
[root@<xymon server> ~]# gdb -q /usr/libexec/xymon/xymongen core.1312
Reading symbols from /usr/libexec/xymon/xymongen...Reading symbols from
/usr/lib/debug/usr/libexec/xymon/xymongen.debug...done.
done.
[New LWP 1312]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/xymon/xymongen
--reportopts=1566446400:1566532799:0:nongr --recent'.
Program terminated with signal 6, Aborted.
#0 0x00007fc746f7e377 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:55
55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007fc746f7e377 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007fc746f7fa68 in __GI_abort () at abort.c:90
#2 0x00005559fc1ce4f5 in sigsegv_handler (signum=<optimized out>) at
sig.c:57
#3 <signal handler called>
#4 strchrnul () at ../sysdeps/x86_64/strchrnul.S:33
#5 0x00007fc746f90681 in __find_specmb (format=0xfce <Address 0xfce out of
bounds>) at printf-parse.h:109
#6 _IO_vfprintf_internal (s=s at entry=0x7ffeb3e6d340,
format=format at entry=0xfce <Address 0xfce out of bounds>,
ap=ap at entry=0x7ffeb3e6d478)
at vfprintf.c:1308
#7 0x00007fc74705dc78 in ___vsprintf_chk (s=0x7ffeb3e6d6c2 "", flags=1,
slen=18446744073709551615,
format=0xfce <Address 0xfce out of bounds>, args=args at entry=0x7ffeb3e6d478)
at vsprintf_chk.c:83
#8 0x00007fc74705dbcd in ___sprintf_chk (s=<optimized out>,
flags=flags at entry=1,
slen=slen at entry=18446744073709551615, format=<optimized out>) at
sprintf_chk.c:32
#9 0x00005559fc1bf96a in sprintf (__fmt=<optimized out>, __s=<optimized
out>)
at /usr/include/bits/stdio2.h:33
#10 parse_histlogfile (starttime=1566446400,
timespec=0x5559fc431f50 <timespec.7157> "Wed_Sep_2_19:34:55_2015",
servicename=0x5559fd61b2d0 "procs",
hostname=0x5559fdc94520 "<client hostname>") at availability.c:174
#11 parse_historyfile (fd=fd at entry=0x5559fdc9be00, repinfo=<optimized out>,
hostname=0x5559fdc94520 "<client hostname>", servicename=0x5559fd61b2d0
"procs",
fromtime=<optimized out>, totime=1566532799,
for_history=for_history at entry=0, warnlevel=97,
greenlevel=99.995000000000005, warnstops=-1, reporttime=0x0) at
availability.c:475
#12 0x00005559fc1b496c in init_state (filename=<optimized out>,
filename at entry=0x7ffeb3e7f950 "<client hostname>.procs", log=log at entry
=0x7ffeb3e7f860)
at loaddata.c:275
#13 0x00005559fc1b568e in load_state (sumhead=sumhead at entry=0x5559fc3fad48
<dispsums>) at loaddata.c:626
#14 0x00005559fc1af794 in main (argc=5, argv=0x7ffeb3e84b58) at
xymongen.c:599
--
Matt Vander Werf
On Thu, Aug 22, 2019 at 5:29 PM Matt Vander Werf <matt1299 at gmail.com> wrote:
> Hi JC,
>
> Ah ha! That is one place I did not look and the timing certainly matches
> up!
>
> I have installed that new version on my Xymon server (running actual RHEL
> 7) and we'll see how it fares tomorrow morning....
>
> Thanks!
>
> --
> Matt Vander Werf
>
>
> On Thu, Aug 22, 2019 at 5:12 PM Japheth Cleaver <cleaver at terabithia.org>
> wrote:
>
>> Hi,
>>
>> I think this might be xymongen in report mode from the "dailyreport" file
>> in /tasks.d/; the timing would check out. I believe the problem here is
>> one of the Terabithia patches now doing the wrong thing after some of the
>> string-handling changes in 4.3.29 -- causing core dumps in certain
>> situations.
>>
>> If you're running actual RHEL7 on this (not CentOS, which hasn't released
>> 7.7 yet), would you mind checking the xymon-4.3.30-0.5 package in the EL7
>> Terabithia testing repo and see if this helps?
>> https://repo.terabithia.org/rpms/xymon/testing/el7/x86_64/
>>
>> Regards,
>> -jc
>>
>> On 8/22/2019 11:34 AM, Matt Vander Werf wrote:
>>
>> Hi Torsten,
>>
>> No, there wasn't anything running from cron or anything else around that
>> time, let alone anything that restarts the network or Xymon.
>>
>> Thanks.
>>
>> --
>> Matt Vander Werf
>>
>>
>> On Wed, Aug 21, 2019 at 5:43 AM Torsten Richter <bb4 at richter-it.net>
>> wrote:
>>
>>> Hi Matt,
>>>
>>> dumb question: is there any cron job running at this time that is
>>> restarting XYmon fiddling with the network, like restarting the network for
>>> some reason?
>>>
>>> Regards,
>>> Torsten
>>>
>>> Matt Vander Werf <matt1299 at gmail.com> hat am 20. August 2019 um 17:10
>>> geschrieben:
>>>
>>> Hi all,
>>>
>>> Every day since we updated our Xymon server to 4.3.29 (from 4.3.28),
>>> I've gotten an e-mail alert due to xymond turning red that reads:
>>>
>>> red xymongen program crashed
>>>
>>> Fatal signal caught!
>>>
>>> The strange thing is that this has happened at 1:04 AM every day...like
>>> clockwork. I have xymongen set to run every 1 minute and it has no problems
>>> running any other time of the day. We are using the Terabithia RPMs and the
>>> Xymon server is running RHEL 7.
>>>
>>> I've scoured the system to find anything that is set to run at/around
>>> that time via cron, etc. and haven't found anything. The system logs don't
>>> show anything is happening around that time either.
>>>
>>> I turned on debug logging for xymond and xymongen and haven't been able
>>> to find anything unusual in either logs around that time. But it is dumping
>>> core files for xymongen every time it crashes.
>>>
>>> I used gdb to get the backtrace on all of the core files (so far) and
>>> I've found that they all show the same thing. It shows the same host in the
>>> backtrace too (although I'm farily confident it isn't specific or isolated
>>> to that host but just the first one it runs into that it has issues with
>>> when processing).
>>>
>>> I've included an example gdb output below (the most recent one) [1].
>>>
>>> Is anyone else running into this by chance? Or any idea what might be
>>> the cause?
>>>
>>> Thanks!
>>>
>>>
>>> [1]
>>> # gdb -q /usr/libexec/xymon/xymongen core.16327
>>> Reading symbols from /usr/libexec/xymon/xymongen...Reading symbols from
>>> /usr/lib/debug/usr/libexec/xymon/xymongen.debug...done.
>>> done.
>>> [New LWP 16327]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `/usr/libexec/xymon/xymongen
>>> --reportopts=1566187200:1566273599:0:nongr --recent'.
>>> Program terminated with signal 6, Aborted.
>>> #0 0x00007f4657c49377 in __GI_raise (sig=sig at entry=6) at
>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
>>> 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>>> (gdb) bt
>>> #0 0x00007f4657c49377 in __GI_raise (sig=sig at entry=6) at
>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
>>> #1 0x00007f4657c4aa68 in __GI_abort () at abort.c:90
>>> #2 0x00005589375dd455 in sigsegv_handler (signum=<optimized out>) at
>>> sig.c:57
>>> #3 <signal handler called>
>>> #4 strchrnul () at ../sysdeps/x86_64/strchrnul.S:33
>>> #5 0x00007f4657c5b681 in __find_specmb (format=0xfce <Address 0xfce out
>>> of bounds>) at printf-parse.h:109
>>> #6 _IO_vfprintf_internal (s=s at entry=0x7ffd5dabcc00,
>>> format=format at entry=0xfce <Address 0xfce out of bounds>, ap=ap at entry=0x7ffd5dabcd38)
>>> at vfprintf.c:1308
>>> #7 0x00007f4657d28c78 in ___vsprintf_chk (s=0x7ffd5dabcf82 "", flags=1,
>>> slen=18446744073709551615,
>>> format=0xfce <Address 0xfce out of bounds>, args=args at entry=0x7ffd5dabcd38)
>>> at vsprintf_chk.c:83
>>> #8 0x00007f4657d28bcd in ___sprintf_chk (s=<optimized out>,
>>> flags=flags at entry=1,
>>> slen=slen at entry=18446744073709551615, format=<optimized out>) at
>>> sprintf_chk.c:32
>>> #9 0x00005589375ce8ca in sprintf (__fmt=<optimized out>, __s=<optimized
>>> out>)
>>> at /usr/include/bits/stdio2.h:33
>>> #10 parse_histlogfile (starttime=1566187200,
>>> timespec=0x558937840f50 <timespec.7157> "Wed_Sep_2_19:34:55_2015",
>>> servicename=0x5589383b6d70 "procs",
>>> hostname=0x558938a335d0 "<client hostname>") at availability.c:174
>>> #11 parse_historyfile (fd=fd at entry=0x558938a3aea0, repinfo=<optimized
>>> out>,
>>> hostname=0x558938a335d0 "<client hostname>",
>>> servicename=0x5589383b6d70 "procs",
>>> fromtime=<optimized out>, totime=1566273599,
>>> for_history=for_history at entry=0, warnlevel=97,
>>> greenlevel=99.995000000000005, warnstops=-1, reporttime=0x0) at
>>> availability.c:475
>>> #12 0x00005589375c38cc in init_state (filename=<optimized out>,
>>> filename at entry=0x7ffd5dacf210 "<client hostname>.procs",
>>> log=log at entry=0x7ffd5dacf120)
>>> at loaddata.c:275
>>> #13 0x00005589375c45ee in load_state (sumhead=sumhead at entry=0x558937809d48
>>> <dispsums>) at loaddata.c:626
>>> #14 0x00005589375be6f4 in main (argc=5, argv=0x7ffd5dad4418) at
>>> xymongen.c:599
>>>
>>>
>>> --
>>> Matt Vander Werf
>>> _______________________________________________
>>> Xymon mailing list
>>> Xymon at xymon.com
>>> http://lists.xymon.com/mailman/listinfo/xymon
>>>
>>>
>> _______________________________________________
>> Xymon mailing listXymon at xymon.comhttp://lists.xymon.com/mailman/listinfo/xymon
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20190823/dfd57803/attachment.htm>
More information about the Xymon
mailing list