[Xymon] xymond crashing! -- Please help!

Matt Vander Werf matt1299 at gmail.com
Sun Jan 31 00:05:13 CET 2016


Hi J.C.,

No,



--
Matt Vander Werf

On Sat, Jan 30, 2016 at 5:46 PM, J.C. Cleaver <cleaver at terabithia.org>
wrote:

>
> On Sat, January 30, 2016 10:45 am, Matt Vander Werf wrote:
> > Hi J.C.,
> >
> > So it appears that only fixed it temporarily.
> >
> > If I stop the service and start it back up again, it crashes again.
> >
> > I think I figured out how to read the core file and get a backtrace for
> > you
> > (I think).
> >
> > Here's what I got from the most recent crash (with some host names
> > obfuscated):
> >
> > [New LWP 13283]
> > Reading symbols from /usr/sbin/xymond...Reading symbols from
> > /usr/lib/debug/usr/sbin/xymond.debug...done.
> > done.
> > Missing separate debuginfo for
> > Try: yum --enablerepo='*debug*' install
> > /usr/lib/debug/.build-id/33/97b0d696701dbd7c09eb4bf023f7f4eebec9ed
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `xymond --restart=/var/lib/xymon/tmp/xymond.chk
> > --checkpoint-file=/var/lib/xymon'.
> > Program terminated with signal 6, Aborted.
> > #0  0x00007f570e29a5f7 in raise () from /lib64/libc.so.6
> > Missing separate debuginfos, use: debuginfo-install
> > glibc-2.17-106.el7_2.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
> > krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64
> > libselinux-2.2.2-6.el7.x86_64 lz4-r131-1.el7.x86_64
> > openssl-libs-1.0.1e-51.el7_2.2.x86_64 pcre-8.32-15.el7.x86_64
> > xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
> > (gdb) backtrace
> > #0  0x00007f570e29a5f7 in raise () from /lib64/libc.so.6
> > #1  0x00007f570e29bce8 in abort () from /lib64/libc.so.6
> > #2  0x00007f570f53cdf5 in sigsegv_handler (signum=<optimized out>) at
> > sig.c:57
> > #3  <signal handler called>
> > #4  0x00007f570f5403b4 in xtree_i_compare (pa=0x7ffead8cb9a0,
> > pb=0x2020202020202020) at tree.c:47
> > #5  0x00007f570e3574c0 in tfind () from /lib64/libc.so.6
> > #6  0x00007f570f5405d4 in xtreeFind (treehandle=<optimized out>,
> > key=key at entry=0x7f57142cb320 "*<client hostname>*") at tree.c:140
> > #7  0x00007f570f5386bd in get_clientconfig
> > (hostname=hostname at entry=0x7f57142cb320
> > "*<client hostname>*", hostclass=hostclass at entry=0x7f57208e4612 "linux",
> >     hostos=hostos at entry=0x7f57208e460c "linux") at clientlocal.c:192
> > #8  0x00007f570f535dec in do_message (msg=msg at entry=0x7f572064c300,
> > origin=origin at entry=0x7f570f550e97 "", can_respond=can_respond at entry=1)
> at
> > xymond.c:4955
> > #9  0x00007f570f5282c7 in main (argc=<optimized out>, argv=<optimized
> > out>)
> > at xymond.c:6288
> >
> >
> > Is this what you wanted? Do you want me to install the debug package for
> > glibc or other packages?
> >
> > Let me know what I can do.
> >
> > Thanks!!
>
> This works. It's strange in that it points to a problem with the
> client-local configs, but I'm not sure how the tree would get into a
> corrupt state.
>
> Were any changes made recently to the client-local file? Any other errors
> seen during xymond's startup that might seem related?
>
> It's probably *not* an issue with a status message, if they're all
> crashing at the same spot. This was an incoming client message that was
> either garbled or accessing garbled data somehow.
>
>
> >
> > --
> > Matt Vander Werf
> >
> > On Sat, Jan 30, 2016 at 1:10 PM, Matt Vander Werf <matt1299 at gmail.com>
> > wrote:
> >
> >> Hi J.C.,
> >>
> >> Moving the xymond.chk checkpoint file out of the way after it was
> >> stopped
> >> seemed to fix this (at least so far).
> >>
> >> I see that I lost all record of disabled tests (getting alerts for
> >> things
> >> that were disabled).
> >>
> >> What all data exactly did I lose with moving that checkpoint file out of
> >> the way?
> >>
> >> Is there anyway to get the data back? Or maybe figure out the
> >> corruptness
> >> in the checkpoint file and then move the file back in place?
>
> There are several different bits in there, including scheduled tasks,
> disable states, and the current status messages. You can manually copy the
> file back at this point while xymond is off and it will load state back
> from it (along with the old status messages, but they'll get overwritten
> as soon as the next cycle come through).
>
>
>
> >>
> >> Also, see my most recent e-mail with the xymonlaunch log (if you haven't
> >> already). Looks like this has happened in the past but resolved
> >> itself....
> >>
> >> Regarding the backtrace....
> >>
> >> I put those lines in /etc/sysconfig/xymonlaunch and I see the core files
> >> being generated now.
> >> I feel embarrassed to admit this, but how exactly do I get the backtrace
> >> out of the binary core files, besides trying to read the files with an
> >> editor? Any way to know which core file had the backtrace?
> >>
> >> Also, I see this in journalctl:
> >>
> >> Ignoring invalid environment assignment 'export
> >> DAEMON_COREFILE_LIMIT=unlimited': /etc/sysconfig/xymonlaunch
>
> Ugh. systemd :( I forgot that that's not a real shell file any more. Looks
> like you found a way though!
>
>
> -jc
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160130/e7258491/attachment.html>


More information about the Xymon mailing list