[Xymon] xymon hostdata module going rogue - bug report

John Thurston john.thurston at alaska.gov
Mon Aug 31 19:19:13 CEST 2015


> On Fri, August 28, 2015 3:16 pm, John Thurston wrote:
>> On 8/28/2015 12:45 PM, John Thurston wrote:
>>> On 6/10/2015 9:01 AM, Scot Kreienkamp wrote:
. . .
>>>> hobbit   28452  0.0  0.0      0     0 ?        Z    12:50   0:00
>>>> [xymond_hostdata] <defunct>
>>>>
>>>> It seemed related to drop messages . . .
>>>
>>> Hey, I think I'm seeing the same thing on Solaris with 4.3.21
>>>
>>> I've ended up here after a customer let me know that email alerts were
>>> not working as expected. After a few hours of digging around, I decided
>>> that the alert daemon was failing to retrieve hostnames and failing
>>> miserably.
>>>
>>> Have other people seen this behavior?
>>
>> I have duplicated this behavior on another xymon server on Solaris. It
>> certainly looks like this behavior breaks the alert daemon. Fortunately,
>> I "drop" hosts in batches so can restart Xymon at that time, but this is
>> still pretty icky.

On 8/28/2015 3:12 PM, J.C. Cleaver wrote:
> The patch from
> http://lists.xymon.com/pipermail/xymon/2015-June/041833.html was checked
> in in https://sourceforge.net/p/xymon/code/7669/ , however it's not in the
> most recent Terabithia RPM.
>
> If you could test the direct patch (for hostdata, at
> http://lists.xymon.com/pipermail/xymon/attachments/20150610/8b425efb/attachment.obj
> ) on your OS, that would be very helpful. Signal handling is always a bit
> tricky to ensure is correct across the board.

I have patched one of my servers and it behaves much better under my 
contrived tests :) This is under Solaris 10 (Update 11) on SPARC. The 
original report was under Red Hat Enterprise Linux 5.

If my understanding of this is correct, it is a pretty nasty defect :(

My failure scenario was non-delivery of some email alerts for hosts in 
dire straits. I have several customers who do not monitor the web 
interface, but rely on email notifications to warn them of impending 
problems. These folks had been without any alerting capability since 
early in July when I "dropped" at host and unknowingly clobbered the 
child of xymond_hostdata.

-- 
    Do things because you should, not just because you can.

John Thurston    907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska



More information about the Xymon mailing list