[Xymon] xymon hostdata module going rogue
J.C. Cleaver
cleaver at terabithia.org
Wed Jun 10 19:20:45 CEST 2015
On Wed, June 10, 2015 10:01 am, Scot Kreienkamp wrote:
> Hi everyone,
>
> I have a xymon server running 4.3.21 that seems to be accumulating
> processes like these:
>
> hobbit 28430 0.0 0.0 0 0 ? Z 12:50 0:00
> [xymond_hostdata] <defunct>
> hobbit 28435 0.0 0.0 0 0 ? Z 12:50 0:00
> [xymond_hostdata] <defunct>
> hobbit 28440 0.0 0.0 0 0 ? Z 12:50 0:00
> [xymond_hostdata] <defunct>
> hobbit 28444 0.0 0.0 0 0 ? Z 12:50 0:00
> [xymond_hostdata] <defunct>
> hobbit 28449 0.0 0.0 0 0 ? Z 12:50 0:00
> [xymond_hostdata] <defunct>
> hobbit 28452 0.0 0.0 0 0 ? Z 12:50 0:00
> [xymond_hostdata] <defunct>
>
> It seemed related to drop messages, so I did a test.
>
>
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 161
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 162
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 163
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 164
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 165
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 166
> [hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
> auxw |grep xymond_hostdata |wc -l
> 167
>
> So every time I send a drop message I get a defunct process hanging out.
> Bug in Xymon?
>
> This is on RHEL5, xymon 4.3.21.
>
> Thanks!
Scot,
Some background: When doing a full drop on a host, xymond_hostdata (and
xymond_history, IIRC) forks to perform the recursive directory removal of
history files and whatnot in the background, then exits out. That's why it
corresponds to those events.
Looks like xymond_hostdata.c is missing a SIGCHLD registration, which is
causing the defunct processes to stack up. Strangely, I haven't observed
this behavior on RHEL6 at all though, even though we're dropping hosts all
the time. Odd.
The following patch should fix the issue for you, I believe.
Regards,
-jc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xymon.hostdata_child.patch
Type: application/octet-stream
Size: 354 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150610/3e102ce9/attachment.obj>
More information about the Xymon
mailing list