[xymon] Scheduled disable causes crash?

Henrik Størner henrik at hswn.dk
Mon Dec 6 12:41:05 CET 2010


In <B08F3F3D67451844A7A8A029FCC71E4C1589BEFCA3 at WIN01.ad.deltamanagement.se> =?iso-8859-1?Q?Johan_Sj=F6berg?= <johan.sjoberg at deltamanagement.se> writes:

>During the last month, we have had some problems with Xymon when using sche=
>duled disabled (added from the web interface).

>The first problem we had was on September 17th, when hobbitd crashed while/=
>after running the scheduled disable. We got the following error in hobbit.l=
>og
>2010-09-17 05:00:00 Fatal error in select: Bad file descriptor
>2010-09-17 05:00:00 Setup complete

There is a bug lurking in the scheduled-task code, but I haven't been
able to quite nail down where it is. I've seen the same problem that
you have a couple of times, where a scheduled "disable" results in
xymond (hobbitd) crashing immediately afterwards.

One potential bug I did catch is fixed with the following patch:

Index: xymond/xymond.c
===================================================================
--- xymond/xymond.c	(revision 6604)
+++ xymond/xymond.c	(working copy)
@@ -3971,7 +3971,7 @@
 	if (msg->doingwhat == RESPONDING) {
 		shutdown(msg->sock, SHUT_RD);
 	}
-	else {
+	else if (msg->sock >= 0) {
 		shutdown(msg->sock, SHUT_RDWR);
 		close(msg->sock);
 		msg->sock = -1;
@@ -5040,6 +5040,8 @@
 					swalk = swalk->next;
 
 					memset(&task, 0, sizeof(task));
+					task.sock = -1;
+					task.doingwhat = NOTALK;
 					inet_aton(runtask->sender, (struct in_addr *) &task.addr.sin_addr.s_addr);
 					task.buf = task.bufp = runtask->command;
 					task.buflen = strlen(runtask->command); task.bufsz = task.buflen+1;


So it would be interesting to see if this helps in your setup. This patch
is against the current beta-3 code, but it applies to version 4.2.3 as
well if you run patch and explicitly tell it which file to patch:

   patch hobbit-4.2.3/hobbitd/hobbitd.c < task.patch


I am not sure if this fixes the problem, though. Because if this is
what causes the crash, then it ought to happen before the log message
that the task ran is written. Unless the bug doesn't crash the system
right away, but only triggers some memory corruption that results in
a later crash ...


Regards,
Henrik




More information about the Xymon mailing list