[Xymon] xymon hostdata module going rogue

J.C. Cleaver cleaver at terabithia.org
Wed Jun 10 19:20:45 CEST 2015


On Wed, June 10, 2015 10:01 am, Scot Kreienkamp wrote:
> Hi everyone,
>
> I have a xymon server running 4.3.21 that seems to be accumulating
> processes like these:
>
> hobbit   28430  0.0  0.0      0     0 ?        Z    12:50   0:00
> [xymond_hostdata] <defunct>
> hobbit   28435  0.0  0.0      0     0 ?        Z    12:50   0:00
> [xymond_hostdata] <defunct>
> hobbit   28440  0.0  0.0      0     0 ?        Z    12:50   0:00
> [xymond_hostdata] <defunct>
> hobbit   28444  0.0  0.0      0     0 ?        Z    12:50   0:00
> [xymond_hostdata] <defunct>
> hobbit   28449  0.0  0.0      0     0 ?        Z    12:50   0:00
> [xymond_hostdata] <defunct>
> hobbit   28452  0.0  0.0      0     0 ?        Z    12:50   0:00
> [xymond_hostdata] <defunct>
>
> It seemed related to drop messages, so I did a test.
>
>
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 161
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 162
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 163
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 164
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 165
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 166
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 167
>
> So every time I send a drop message I get a defunct process hanging out.
> Bug in Xymon?
>
> This is on RHEL5, xymon 4.3.21.
>
> Thanks!


Scot,


Some background: When doing a full drop on a host, xymond_hostdata (and
xymond_history, IIRC) forks to perform the recursive directory removal of
history files and whatnot in the background, then exits out. That's why it
corresponds to those events.


Looks like xymond_hostdata.c is missing a SIGCHLD registration, which is
causing the defunct processes to stack up. Strangely, I haven't observed
this behavior on RHEL6 at all though, even though we're dropping hosts all
the time. Odd.


The following patch should fix the issue for you, I believe.


Regards,

-jc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xymon.hostdata_child.patch
Type: application/octet-stream
Size: 354 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150610/3e102ce9/attachment.obj>


More information about the Xymon mailing list