[Xymon] xymon hostdata module going rogue - bug report
John Thurston
john.thurston at alaska.gov
Mon Aug 31 19:19:13 CEST 2015
> On Fri, August 28, 2015 3:16 pm, John Thurston wrote:
>> On 8/28/2015 12:45 PM, John Thurston wrote:
>>> On 6/10/2015 9:01 AM, Scot Kreienkamp wrote:
. . .
>>>> hobbit 28452 0.0 0.0 0 0 ? Z 12:50 0:00
>>>> [xymond_hostdata] <defunct>
>>>>
>>>> It seemed related to drop messages . . .
>>>
>>> Hey, I think I'm seeing the same thing on Solaris with 4.3.21
>>>
>>> I've ended up here after a customer let me know that email alerts were
>>> not working as expected. After a few hours of digging around, I decided
>>> that the alert daemon was failing to retrieve hostnames and failing
>>> miserably.
>>>
>>> Have other people seen this behavior?
>>
>> I have duplicated this behavior on another xymon server on Solaris. It
>> certainly looks like this behavior breaks the alert daemon. Fortunately,
>> I "drop" hosts in batches so can restart Xymon at that time, but this is
>> still pretty icky.
On 8/28/2015 3:12 PM, J.C. Cleaver wrote:
> The patch from
> http://lists.xymon.com/pipermail/xymon/2015-June/041833.html was checked
> in in https://sourceforge.net/p/xymon/code/7669/ , however it's not in the
> most recent Terabithia RPM.
>
> If you could test the direct patch (for hostdata, at
> http://lists.xymon.com/pipermail/xymon/attachments/20150610/8b425efb/attachment.obj
> ) on your OS, that would be very helpful. Signal handling is always a bit
> tricky to ensure is correct across the board.
I have patched one of my servers and it behaves much better under my
contrived tests :) This is under Solaris 10 (Update 11) on SPARC. The
original report was under Red Hat Enterprise Linux 5.
If my understanding of this is correct, it is a pretty nasty defect :(
My failure scenario was non-delivery of some email alerts for hosts in
dire straits. I have several customers who do not monitor the web
interface, but rely on email notifications to warn them of impending
problems. These folks had been without any alerting capability since
early in July when I "dropped" at host and unknowingly clobbered the
child of xymond_hostdata.
--
Do things because you should, not just because you can.
John Thurston 907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska
More information about the Xymon
mailing list