<div dir="ltr"><div>Hi J.C.,<br><br></div>No, <br><br><br></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div>--</div><div>Matt Vander Werf</div></div></div>

<br><div class="gmail_quote">On Sat, Jan 30, 2016 at 5:46 PM, J.C. Cleaver <span dir="ltr"><<a href="mailto:cleaver@terabithia.org" target="_blank">cleaver@terabithia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>

On Sat, January 30, 2016 10:45 am, Matt Vander Werf wrote:<br>

> Hi J.C.,<br>

><br>

> So it appears that only fixed it temporarily.<br>

><br>

> If I stop the service and start it back up again, it crashes again.<br>

><br>

> I think I figured out how to read the core file and get a backtrace for<br>

> you<br>

> (I think).<br>

><br>

> Here's what I got from the most recent crash (with some host names<br>

> obfuscated):<br>

><br>

> [New LWP 13283]<br>

> Reading symbols from /usr/sbin/xymond...Reading symbols from<br>

> /usr/lib/debug/usr/sbin/xymond.debug...done.<br>

> done.<br>

> Missing separate debuginfo for<br>

> Try: yum --enablerepo='*debug*' install<br>

> /usr/lib/debug/.build-id/33/97b0d696701dbd7c09eb4bf023f7f4eebec9ed<br>

> [Thread debugging using libthread_db enabled]<br>

> Using host libthread_db library "/lib64/libthread_db.so.1".<br>

> Core was generated by `xymond --restart=/var/lib/xymon/tmp/xymond.chk<br>

> --checkpoint-file=/var/lib/xymon'.<br>

> Program terminated with signal 6, Aborted.<br>

> #0  0x00007f570e29a5f7 in raise () from /lib64/libc.so.6<br>

> Missing separate debuginfos, use: debuginfo-install<br>

> glibc-2.17-106.el7_2.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64<br>

> krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64<br>

> libselinux-2.2.2-6.el7.x86_64 lz4-r131-1.el7.x86_64<br>

> openssl-libs-1.0.1e-51.el7_2.2.x86_64 pcre-8.32-15.el7.x86_64<br>

> xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64<br>

> (gdb) backtrace<br>

> #0  0x00007f570e29a5f7 in raise () from /lib64/libc.so.6<br>

> #1  0x00007f570e29bce8 in abort () from /lib64/libc.so.6<br>

> #2  0x00007f570f53cdf5 in sigsegv_handler (signum=<optimized out>) at<br>

> sig.c:57<br>

> #3  <signal handler called><br>

> #4  0x00007f570f5403b4 in xtree_i_compare (pa=0x7ffead8cb9a0,<br>

> pb=0x2020202020202020) at tree.c:47<br>

> #5  0x00007f570e3574c0 in tfind () from /lib64/libc.so.6<br>

> #6  0x00007f570f5405d4 in xtreeFind (treehandle=<optimized out>,<br>

</div></div>> key=key@entry=0x7f57142cb320 "*<client hostname>*") at tree.c:140<br>

<span class="">> #7  0x00007f570f5386bd in get_clientconfig<br>

> (hostname=hostname@entry=0x7f57142cb320<br>

</span>> "*<client hostname>*", hostclass=hostclass@entry=0x7f57208e4612 "linux",<br>

<span class="">>     hostos=hostos@entry=0x7f57208e460c "linux") at clientlocal.c:192<br>

> #8  0x00007f570f535dec in do_message (msg=msg@entry=0x7f572064c300,<br>

> origin=origin@entry=0x7f570f550e97 "", can_respond=can_respond@entry=1) at<br>

> xymond.c:4955<br>

> #9  0x00007f570f5282c7 in main (argc=<optimized out>, argv=<optimized<br>

> out>)<br>

> at xymond.c:6288<br>

><br>

><br>

> Is this what you wanted? Do you want me to install the debug package for<br>

> glibc or other packages?<br>

><br>

> Let me know what I can do.<br>

><br>

> Thanks!!<br>

<br>

</span>This works. It's strange in that it points to a problem with the<br>

client-local configs, but I'm not sure how the tree would get into a<br>

corrupt state.<br>

<br>

Were any changes made recently to the client-local file? Any other errors<br>

seen during xymond's startup that might seem related?<br>

<br>

It's probably *not* an issue with a status message, if they're all<br>

crashing at the same spot. This was an incoming client message that was<br>

either garbled or accessing garbled data somehow.<br>

<span class=""><br>

<br>

><br>

> --<br>

> Matt Vander Werf<br>

><br>

> On Sat, Jan 30, 2016 at 1:10 PM, Matt Vander Werf <<a href="mailto:matt1299@gmail.com">matt1299@gmail.com</a>><br>

> wrote:<br>

><br>

>> Hi J.C.,<br>

>><br>

>> Moving the xymond.chk checkpoint file out of the way after it was<br>

>> stopped<br>

>> seemed to fix this (at least so far).<br>

>><br>

>> I see that I lost all record of disabled tests (getting alerts for<br>

>> things<br>

>> that were disabled).<br>

>><br>

>> What all data exactly did I lose with moving that checkpoint file out of<br>

>> the way?<br>

>><br>

>> Is there anyway to get the data back? Or maybe figure out the<br>

>> corruptness<br>

>> in the checkpoint file and then move the file back in place?<br>

<br>

</span>There are several different bits in there, including scheduled tasks,<br>

disable states, and the current status messages. You can manually copy the<br>

file back at this point while xymond is off and it will load state back<br>

from it (along with the old status messages, but they'll get overwritten<br>

as soon as the next cycle come through).<br>

<span class=""><br>

<br>

<br>

>><br>

>> Also, see my most recent e-mail with the xymonlaunch log (if you haven't<br>

>> already). Looks like this has happened in the past but resolved<br>

>> itself....<br>

>><br>

>> Regarding the backtrace....<br>

>><br>

>> I put those lines in /etc/sysconfig/xymonlaunch and I see the core files<br>

>> being generated now.<br>

>> I feel embarrassed to admit this, but how exactly do I get the backtrace<br>

>> out of the binary core files, besides trying to read the files with an<br>

>> editor? Any way to know which core file had the backtrace?<br>

>><br>

>> Also, I see this in journalctl:<br>

>><br>

>> Ignoring invalid environment assignment 'export<br>

>> DAEMON_COREFILE_LIMIT=unlimited': /etc/sysconfig/xymonlaunch<br>

<br>

</span>Ugh. systemd :( I forgot that that's not a real shell file any more. Looks<br>

like you found a way though!<br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

-jc<br>

<br>

<br>

</font></span></blockquote></div><br></div>