[hobbit] Weird disk alert with bad data

Stef Coene stef.coene at docum.org
Tue Oct 28 21:10:17 CET 2008


On Tuesday 28 October 2008, Martha McConaghy wrote:
> We recently got the AIX client working with our Hobbit server.  I then
> had to apply a patch to rrd/do_vmstat.c to fix a problem with rrd crashing
> due to an uninitialized variable coming from the AIX client.  Despite that,
> I'm still seeing a weird problem.  One of the other non-AIX clients will
> have their disk check to to red alert.  When, I take a look at it, the
> disks are fine.  However, the data being processed by rrd is off by a few
> characters which seems to be what is causing the red alert to be generated.
>  It will last for an hour or so, then will go green again and the problem
> will move to a different non-AIX client.  When I remove the three AIX
> clients from bb-hosts, the problem disappears.  So, it seems to be pretty
> clearly related to the AIX client, though is affecting other alerts.
>
> Any thoughts on what to do?  Have we stumbled onto another bug?
What patch did you applied for the rrd?

I have lots of AIX client talking to lots of hobbit servers and I never had a 
problem with the rrds.  The only patch I applied regarding vmstat is adding 
cpu_pc and cpu_ec and striping of . and , of the numbers.

My vmstat patch:

--- ./hobbit-4.2.0/hobbitd/rrd/do_vmstat.c   2006-08-09 22:10:06.000000000 
+0200
+++ ./hobbit-4.2.0-OK/hobbitd/rrd/do_vmstat.c   2007-03-13 11:40:39.000000000 
+0100
@@ -76,6 +76,8 @@
   { 14, "cpu_sys" },
   { 15, "cpu_idl" },
   { 16, "cpu_wait" },
+  { 17, "cpu_pc" },
+  { 18, "cpu_ec" },
   { -1, NULL }
 };

@@ -322,6 +324,17 @@
   p = strchr(datapart, '\n'); if (p) *p = '\0';
   p = strtok(datapart, " "); datacount = 0;
   while (p && (datacount < MAX_VMSTAT_VALUES)) {
+
+      /* Removing . and , from the numbers */
+      char *p1;
+      while ( (p1 = strchr(p,'.')) != NULL ) {
+         strcpy (p1, p1+1) ;
+      }
+      char *p2;
+      while ( (p2 = strchr(p,',')) != NULL ) {
+         strcpy (p2, p2+1) ;
+      }
+
      values[datacount++] = atoi(p);
      p = strtok(NULL, " ");
   }


Stef



More information about the Xymon mailing list