[Xymon] Two procs/processes graph issues after server upgrade from 4.3.21 to 4.3.22-rc2

Japheth Cleaver cleaver at terabithia.org
Thu Nov 12 00:53:48 CET 2015


There's a primary and a secondary issue here.

The chief problem was that TRACK and OPTIONAL seemed to not be tracked 
as options to a test as a result of r7683 and r7686 (on some platforms). 
Secondarily, 'nostale' is the default on svcstatus.sh pages, wherein it 
will eventually not display an old RRD page on the status -- in this 
case, because it hadn't been updated recently. I'm not sure how I feel 
about the latter issue, but it's been that way for a while.

I believe the included patch fixes the main issue; I'm testing now (as 
4.3.22-5 in http://terabithia.org/rpms/xymon/testing/el6/x86_64/).

This is enough to warrant a 4.3.23 release shortly, upon confirmation.

-jc


On 11/11/2015 9:54 AM, Axel Beckert wrote:
> Hi,
>
> [TL;DR: See Summary at the end.]
>
> I'm slowly running out of ideas with the following issue which has
> been noticed after I rolled out 4.3.22-rc2 on our two monitoring
> servers (still running the servers on 4.3.22-rc2 at the moment):
>
> The graph on
> https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=procs
> is no more there, because
> https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph_width=576&graph_height=120&disp=zwoelfi&nostale&color=green&graph_start=1447069296&graph_end=1447242096&graph=hourly&action=view
> returns only an 1x1 pixel PMG. The same happens on the second
> (independent, not slave) server, too.
>
> (No version changes on the affected clients. Those I checked have
> either 4.3.0-beta2 from Debian 7 or 4.3.17 from Debian 8.)
>
> I've found the following messages upon reloading the above URL in Apache's
> error.log:
>
> 2015-11-11 12:32:38.839801 Sendto failed: Connection refused
> 2015-11-11 12:32:38.839853 Sendto failed: Connection refused
> 2015-11-11 12:32:38.839871 Sendto failed: Connection refused
>
> I've found http://lists.xymon.com/archive/2015-February/041189.html
> with these messages, stopped the xymon service, removed all left over
> rrdctl.* files from /var/lib/xymon/tmp/ and started the xymon service
> again.
>
> Result is: I still only get an 1x1 pixel PNG, but the error messages
> are gone, i.e. the issues are likely unrelated as they were in the
> mailing list posting above.
>
> Then again on
> https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=trends
> the "Process counts" graph is there (but seems not working):
>
> https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph_width=576&graph_height=120&first=1&count=4&disp=zwoelfi&graph_start=1447069994&graph_end=1447242794&graph=hourly&action=view
>
> The difference between this and the first URL are (besides the time
> stamps): The first URL has nostale (without value) and color=green as additional query
> string parameters, and the second URL has instead first=1 and count=4
> as query string parameters.
>
> As soon as I remove the "nostale" without a value or add a value like
> e.g. "nostale=1", the graph is back again (but still no more working).
>
> So while the (reduced to the minimum parameters) URL
> https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph=hourly&action=view
> shows (an empty) graph,
> https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&disp=zwoelfi&graph=hourly&action=view&nostale
> gives a 1x1 pixel.
>
> With regards to the empty graph,
> https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph=daily&action=view
> is not empty, it just shows that there is no more data since the 4th
> of November (when I updated the servers from 4.3.21 to 4.3.22-rc2).
>
> And indeed, in /var/lib/xymon/rrd/zwoelfi/, not all files have been
> updated anymore since 4th of November:
>
> # ls -l *proc*
> -rw-r--r-- 1 xymon xymon 19640 Nov  4 15:40 processes.apache2.rrd
> -rw-r--r-- 1 xymon xymon 19640 Nov  4 15:40 processes.automount.rrd
> -rw-r--r-- 1 xymon xymon 19640 Nov  4 15:40 processes.stress.rrd
> -rw-r--r-- 1 xymon xymon 19640 Nov 11 13:09 procs.rrd
> #
>
>
> Summary
> =======
>
> So there seem to be two issues with 4.3.22:
>
> * The graph in the procs check's page isn't displayed properly.
>
>    Either
>
>    + "nostale" should get a value in the page/template,
>    + or the parsing of the "nostale" parameter without value in the
>      showgraph CGI
>
>    should be fixed. This sounds rather easy, but I'm not sure which
>    variant is the expected one.
>
> * For some reason the processes.*.rrd files defined by "TRACK=" in
>    analysis.cfg no more get updated.
>
>    Here I currently have no good idea where this comes from. Maybe from
>    one of the NCV-related changes. At least I found no configuration
>    change (be it local or in the defaults/templates) which could have
>    triggered this issue.
>
> 		Kind regards, Axel Beckert

-------------- next part --------------
--- xymond/client_config.c.chkflags32	2015-11-11 12:47:51.629681735 -0800
+++ xymond/client_config.c	2015-11-11 13:27:06.897682379 -0800
@@ -117,36 +117,36 @@
 } c_paging_t;
 
-#define FCHK_NOEXIST  (1ULL << 0)
-#define FCHK_TYPE     (1ULL << 1)
-#define FCHK_MODE     (1ULL << 2)
-#define FCHK_MINLINKS (1ULL << 3)
-#define FCHK_MAXLINKS (1ULL << 4)
-#define FCHK_EQLLINKS (1ULL << 5)
-#define FCHK_MINSIZE  (1ULL << 6)
-#define FCHK_MAXSIZE  (1ULL << 7)
-#define FCHK_EQLSIZE  (1ULL << 8)
-#define FCHK_OWNERID  (1ULL << 10)
-#define FCHK_OWNERSTR (1ULL << 11)
-#define FCHK_GROUPID  (1ULL << 12)
-#define FCHK_GROUPSTR (1ULL << 13)
-#define FCHK_CTIMEMIN (1ULL << 16)
-#define FCHK_CTIMEMAX (1ULL << 17)
-#define FCHK_CTIMEEQL (1ULL << 18)
-#define FCHK_MTIMEMIN (1ULL << 19)
-#define FCHK_MTIMEMAX (1ULL << 20)
-#define FCHK_MTIMEEQL (1ULL << 21)
-#define FCHK_ATIMEMIN (1ULL << 22)
-#define FCHK_ATIMEMAX (1ULL << 23)
-#define FCHK_ATIMEEQL (1ULL << 24)
-#define FCHK_MD5      (1ULL << 25)
-#define FCHK_SHA1     (1ULL << 26)
-#define FCHK_SHA256   (1ULL << 27)
-#define FCHK_SHA512   (1ULL << 28)
-#define FCHK_SHA224   (1ULL << 29)
-#define FCHK_SHA384   (1ULL << 30)
-#define FCHK_RMD160   (1ULL << 31)
+#define FCHK_NOEXIST  (1 << 0)
+#define FCHK_TYPE     (1 << 1)
+#define FCHK_MODE     (1 << 2)
+#define FCHK_MINLINKS (1 << 3)
+#define FCHK_MAXLINKS (1 << 4)
+#define FCHK_EQLLINKS (1 << 5)
+#define FCHK_MINSIZE  (1 << 6)
+#define FCHK_MAXSIZE  (1 << 7)
+#define FCHK_EQLSIZE  (1 << 8)
+#define FCHK_OWNERID  (1 << 10)
+#define FCHK_OWNERSTR (1 << 11)
+#define FCHK_GROUPID  (1 << 12)
+#define FCHK_GROUPSTR (1 << 13)
+#define FCHK_CTIMEMIN (1 << 16)
+#define FCHK_CTIMEMAX (1 << 17)
+#define FCHK_CTIMEEQL (1 << 18)
+#define FCHK_MTIMEMIN (1 << 19)
+#define FCHK_MTIMEMAX (1 << 20)
+#define FCHK_MTIMEEQL (1 << 21)
+#define FCHK_ATIMEMIN (1 << 22)
+#define FCHK_ATIMEMAX (1 << 23)
+#define FCHK_ATIMEEQL (1 << 24)
+#define FCHK_MD5      (1 << 25)
+#define FCHK_SHA1     (1 << 26)
+#define FCHK_SHA256   (1 << 27)
+#define FCHK_SHA512   (1 << 28)
+#define FCHK_SHA224   (1 << 29)
+#define FCHK_SHA384   (1 << 30)
+#define FCHK_RMD160   (1 << 31)
 
-#define CHK_OPTIONAL  (1ULL << 33)
-#define CHK_TRACKIT   (1ULL << 34)
+#define CHK_OPTIONAL  (1 << 0)
+#define CHK_TRACKIT   (1 << 1)
  
 typedef struct c_file_t {
@@ -253,5 +253,6 @@
 	ruletype_t ruletype;
 	int cfid;
-	unsigned long long flags;
+	uint32_t flags;
+	uint32_t chkflags;
 	struct c_rule_t *next;
 	union {
@@ -979,5 +980,5 @@
 					}
 					else if (strncasecmp(tok, "track", 5) == 0) {
-						currule->flags |= CHK_TRACKIT;
+						currule->chkflags |= CHK_TRACKIT;
 						if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
 					}
@@ -1028,5 +1029,5 @@
 					}
 					else if (strcasecmp(tok, "optional") == 0) {
-						currule->flags |= CHK_OPTIONAL;
+						currule->chkflags |= CHK_OPTIONAL;
 					}
 					else if (idx == 0) {
@@ -1199,9 +1200,9 @@
 					}
 					else if (strncasecmp(tok, "track", 5) == 0) {
-						currule->flags |= CHK_TRACKIT;
+						currule->chkflags |= CHK_TRACKIT;
 						if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
 					}
 					else if (strcasecmp(tok, "optional") == 0) {
-						currule->flags |= CHK_OPTIONAL;
+						currule->chkflags |= CHK_OPTIONAL;
 					}
 					else {
@@ -1230,5 +1231,5 @@
 					}
 					else if (strncasecmp(tok, "track", 5) == 0) {
-						currule->flags |= CHK_TRACKIT;
+						currule->chkflags |= CHK_TRACKIT;
 						if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
 					}
@@ -1292,5 +1293,5 @@
 					}
 					else if (strncasecmp(tok, "track", 5) == 0) {
-						currule->flags |= CHK_TRACKIT;
+						currule->chkflags |= CHK_TRACKIT;
 						if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
 					}
@@ -1543,5 +1544,5 @@
 					}
 					else if (strncasecmp(tok, "track", 5) == 0) {
-						currule->flags |= CHK_TRACKIT;
+						currule->chkflags |= CHK_TRACKIT;
 						if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
 					}
@@ -1906,10 +1907,10 @@
 		}
 
-		if (rwalk->flags & CHK_TRACKIT) {
+		if (rwalk->chkflags & CHK_TRACKIT) {
 			printf(" TRACK");
 			if (rwalk->rrdidstr) printf("=%s", rwalk->rrdidstr);
 		}
 
-		if (rwalk->flags & CHK_OPTIONAL) printf(" OPTIONAL");
+		if (rwalk->chkflags & CHK_OPTIONAL) printf(" OPTIONAL");
 
 		if (rwalk->timespec) printf(" TIME=%s", rwalk->timespec);
@@ -2568,5 +2569,5 @@
 
 		if (nofile) {
-			if (!(rule->flags & CHK_OPTIONAL)) {
+			if (!(rule->chkflags & CHK_OPTIONAL)) {
 				if (COL_YELLOW > result) result = COL_YELLOW;
 				addalertgroup(rule->groups);
@@ -2751,5 +2752,5 @@
 		*anyrules = 1;
 		if (!exists) {
-			if (rwalk->flags & CHK_OPTIONAL) goto nextcheck;
+			if (rwalk->chkflags & CHK_OPTIONAL) goto nextcheck;
 
 			if (!(rwalk->flags & FCHK_NOEXIST)) {
@@ -2984,5 +2985,5 @@
 			}
 		}
-		if (rwalk->flags & CHK_TRACKIT) {
+		if (rwalk->chkflags & CHK_TRACKIT) {
 			*trackit = (trackit || (ftype == S_IFREG));
 			*id = rwalk->rrdidstr;
@@ -3066,5 +3067,5 @@
 			}
 		}
-		if (rwalk->flags & CHK_TRACKIT) {
+		if (rwalk->chkflags & CHK_TRACKIT) {
 			*trackit = 1;
 			*id = rwalk->rrdidstr;
@@ -3238,5 +3239,5 @@
 			*warnage = rule->rule.mqqueue.warnage;
 			*critage = rule->rule.mqqueue.critage;
-			if (rule->flags & CHK_TRACKIT) *trackit = (rule->rrdidstr ? rule->rrdidstr : "");
+			if (rule->chkflags & CHK_TRACKIT) *trackit = (rule->rrdidstr ? rule->rrdidstr : "");
 			return;
 		}
@@ -3471,5 +3472,5 @@
 		if ((*lowlim !=  0) && (*count < *lowlim)) *color = (*walk)->rule->rule.proc.color;
 		if ((*uplim  != -1) && (*count > *uplim)) *color = (*walk)->rule->rule.proc.color;
-		*trackit = ((*walk)->rule->flags & CHK_TRACKIT);
+		*trackit = ((*walk)->rule->chkflags & CHK_TRACKIT);
 		*id = (*walk)->rule->rrdidstr;
 		if (group) *group = (*walk)->rule->groups;
@@ -3540,5 +3541,5 @@
 		if ((*lowlim !=  0) && (*count < *lowlim)) *color = (*walk)->rule->rule.port.color;
 		if ((*uplim  != -1) && (*count > *uplim)) *color = (*walk)->rule->rule.port.color;
-		*trackit = ((*walk)->rule->flags & CHK_TRACKIT);
+		*trackit = ((*walk)->rule->chkflags & CHK_TRACKIT);
 		*id = (*walk)->rule->rrdidstr;
 		if (group) *group = (*walk)->rule->groups;


More information about the Xymon mailing list