[Xymon] xymon hostdata module going rogue - bug report
J.C. Cleaver
cleaver at terabithia.org
Mon Aug 31 23:24:05 CEST 2015
On Mon, August 31, 2015 10:19 am, John Thurston wrote:
>
>> On Fri, August 28, 2015 3:16 pm, John Thurston wrote:
>>> On 8/28/2015 12:45 PM, John Thurston wrote:
>>>> On 6/10/2015 9:01 AM, Scot Kreienkamp wrote:
> . . .
>>>>> hobbit 28452 0.0 0.0 0 0 ? Z 12:50 0:00
>>>>> [xymond_hostdata] <defunct>
>>>>>
>>>>> It seemed related to drop messages . . .
>>>>
>>>> Hey, I think I'm seeing the same thing on Solaris with 4.3.21
>>>>
>>>> I've ended up here after a customer let me know that email alerts were
>>>> not working as expected. After a few hours of digging around, I
>>>> decided
>>>> that the alert daemon was failing to retrieve hostnames and failing
>>>> miserably.
>>>>
>>>> Have other people seen this behavior?
>>>
>>> I have duplicated this behavior on another xymon server on Solaris. It
>>> certainly looks like this behavior breaks the alert daemon.
>>> Fortunately,
>>> I "drop" hosts in batches so can restart Xymon at that time, but this
>>> is
>>> still pretty icky.
>
> On 8/28/2015 3:12 PM, J.C. Cleaver wrote:
>> The patch from
>> http://lists.xymon.com/pipermail/xymon/2015-June/041833.html was checked
>> in in https://sourceforge.net/p/xymon/code/7669/ , however it's not in
>> the
>> most recent Terabithia RPM.
>>
>> If you could test the direct patch (for hostdata, at
>> http://lists.xymon.com/pipermail/xymon/attachments/20150610/8b425efb/attachment.obj
>> ) on your OS, that would be very helpful. Signal handling is always a
>> bit
>> tricky to ensure is correct across the board.
>
> I have patched one of my servers and it behaves much better under my
> contrived tests :) This is under Solaris 10 (Update 11) on SPARC. The
> original report was under Red Hat Enterprise Linux 5.
>
> If my understanding of this is correct, it is a pretty nasty defect :(
>
> My failure scenario was non-delivery of some email alerts for hosts in
> dire straits. I have several customers who do not monitor the web
> interface, but rely on email notifications to warn them of impending
> problems. These folks had been without any alerting capability since
> early in July when I "dropped" at host and unknowingly clobbered the
> child of xymond_hostdata.
>
Thanks for the confirmation... Yes, I believe it's probably time to start
another release cycle, for this and a few other of the recent bug fixes
still pending.
Regards,
-jc
More information about the Xymon
mailing list