[hobbit] increasing no. of hobbitd zombie's

Henrik Stoerner henrik at hswn.dk
Tue Oct 25 12:50:49 CEST 2005


On Tue, Oct 25, 2005 at 11:57:57AM +0200, Heinecke at hansenet.com wrote:
> Hi,
>  
> every "checkpoint-interval" i get a new hobbitd zombie process.
>  
> #> ps auxwww 
> ----snip ---
> hobbit   25559  0.0  0.0     0    0 ?        Z    11:06   0:00 [hobbitd] <defunct>
> hobbit   25917  0.0  0.0     0    0 ?        Z    11:16   0:00 [hobbitd] <defunct>
> hobbit   26283  0.0  0.0     0    0 ?        Z    11:26   0:00 [hobbitd] <defunct>
> hobbit   26648  0.0  0.0     0    0 ?        Z    11:36   0:00 [hobbitd] <defunct>

You're right that it is related to the checkpoint'ing - hobbitd forks a
child process to save the checkpoint file.

What I don't understand is why it isn't cleaned up afterwards. Could you
do a "ps -lw -u hobbit" ? I'm curious to see what the PPID is for these
zombies.

> This happens on Debian LINUX 3.1 'Sarge'. Analog config on different Solaris 8 SPARC Boxes => no problem.
>  
> On the Debian box, i have DISABLED bbdisplay in hobbitlaunch.cfg, because this box should only act as a kind of LAN probe (only bb-net and forwarding of LAN client stati to a central bbdisplay)

In that case you don't need hobbitd running at all.

Hmm - perhaps this happens because there are no messages sent to this
hobbitd instance. I think that's the cause - looking over the code it
seems that if no messages arrive, the code to clean up the child
processes is never reached.

The attached patch should fix it, although it is of course a non-issue
if you stop hobbitd on this box.

Regards,
Henrik

-------------- next part --------------
--- hobbitd/hobbitd.c	2005/09/13 08:02:50	1.183
+++ hobbitd/hobbitd.c	2005/10/25 10:48:11
@@ -3337,6 +3340,7 @@
 		 *
 		 * First attend to the housekeeping chores:
 		 * - send out our heartbeat signal;
+		 * - pick up children to avoid zombies;
 		 * - rotate logs, if we have been asked to;
 		 * - re-load the bb-hosts configuration if needed;
 		 * - check for stale status-logs that must go purple;
@@ -3358,6 +3362,9 @@
 			kill(parentpid, SIGUSR2);
 		}
 
+		/* Pickup any finished child processes to avoid zombies */
+		while (wait3(&childstat, WNOHANG, NULL) > 0) ;
+
 		if (logfn && dologswitch) {
 			freopen(logfn, "a", stdout);
 			freopen(logfn, "a", stderr);
@@ -3666,9 +3673,6 @@
 				conntail->next = NULL;
 			}
 		}
-
-		/* Pickup any finished child processes to avoid zombies */
-		while (wait3(&childstat, WNOHANG, NULL) > 0) ;
 	} while (running);
 
 	/* Tell the workers we to shutdown also */


More information about the Xymon mailing list