[Xymon] xymon hostdata module going rogue - bug report

Mark Felder feld at feld.me
Fri Sep 4 18:08:20 CEST 2015



On Mon, Aug 31, 2015, at 16:24, J.C. Cleaver wrote:
> On Mon, August 31, 2015 10:19 am, John Thurston wrote:
> >
> >> On Fri, August 28, 2015 3:16 pm, John Thurston wrote:
> >>> On 8/28/2015 12:45 PM, John Thurston wrote:
> >>>> On 6/10/2015 9:01 AM, Scot Kreienkamp wrote:
> > . . .
> >>>>> hobbit   28452  0.0  0.0      0     0 ?        Z    12:50   0:00
> >>>>> [xymond_hostdata] <defunct>
> >>>>>
> >>>>> It seemed related to drop messages . . .
> >>>>
> >>>> Hey, I think I'm seeing the same thing on Solaris with 4.3.21
> >>>>
> >>>> I've ended up here after a customer let me know that email alerts were
> >>>> not working as expected. After a few hours of digging around, I
> >>>> decided
> >>>> that the alert daemon was failing to retrieve hostnames and failing
> >>>> miserably.
> >>>>
> >>>> Have other people seen this behavior?
> >>>
> >>> I have duplicated this behavior on another xymon server on Solaris. It
> >>> certainly looks like this behavior breaks the alert daemon.
> >>> Fortunately,
> >>> I "drop" hosts in batches so can restart Xymon at that time, but this
> >>> is
> >>> still pretty icky.
> >
> > On 8/28/2015 3:12 PM, J.C. Cleaver wrote:
> >> The patch from
> >> http://lists.xymon.com/pipermail/xymon/2015-June/041833.html was checked
> >> in in https://sourceforge.net/p/xymon/code/7669/ , however it's not in
> >> the
> >> most recent Terabithia RPM.
> >>
> >> If you could test the direct patch (for hostdata, at
> >> http://lists.xymon.com/pipermail/xymon/attachments/20150610/8b425efb/attachment.obj
> >> ) on your OS, that would be very helpful. Signal handling is always a
> >> bit
> >> tricky to ensure is correct across the board.
> >
> > I have patched one of my servers and it behaves much better under my
> > contrived tests :) This is under Solaris 10 (Update 11) on SPARC. The
> > original report was under Red Hat Enterprise Linux 5.
> >
> > If my understanding of this is correct, it is a pretty nasty defect :(
> >
> > My failure scenario was non-delivery of some email alerts for hosts in
> > dire straits. I have several customers who do not monitor the web
> > interface, but rely on email notifications to warn them of impending
> > problems. These folks had been without any alerting capability since
> > early in July when I "dropped" at host and unknowingly clobbered the
> > child of xymond_hostdata.
> >
> 
> 
> Thanks for the confirmation... Yes, I believe it's probably time to start
> another release cycle, for this and a few other of the recent bug fixes
> still pending.
> 
> 

For the record, I can't reproduce this on FreeBSD either.



More information about the Xymon mailing list