[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Xymon-4.3.0-beta1: hobbit_rrd data msgs truncated

To: hobbit (at) hswn.dk
Subject: Re: [hobbit] Xymon-4.3.0-beta1: hobbit_rrd data msgs truncated
From: Dominique Frise <dominique.frise (at) unil.ch>
Date: Mon, 20 Apr 2009 19:41:01 +0200
References: <49E5A42E.9000903 (at) unil.ch> <49E89A13.4040109 (at) unil.ch>
User-agent: Thunderbird 2.0.0.14 (X11/20080531)

Dominique Frise wrote:

Dominique Frise wrote:

Hi,
We track "surgemail" processes using following rule inhobbit-clients.cfg:
HOST=xyz
    PROC ./surgemail min=0 TRACK=surgemail
The ps listing in msg.xyz.txt reports 315 "./surgemail" processes,while the rrd graph only shows ~30 processes.
Here the last corresponding dataset of processes.surgemail.rrd file(after flushing the cache by stopping Xymon):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd";>
<rrd> <version> 0003 </version>
        <step> 300 </step> 
<lastupdate> 1239775972 </lastupdate> 
        <ds>
                <name> count </name>
                <type> GAUGE </type>
                <minimal_heartbeat> 600 </minimal_heartbeat>
                <min> 0.0000000000e+00 </min>
                <max> NaN </max>

                
                <last_ds> 30 </last_ds>
                <value> 5.1600000000e+03 </value>
                <unknown_sec> 0 </unknown_sec>
        </ds>

   <rra>

We tried to let Xymon recreate a fresh rrd without success.
The same configuration was working with Hobbit-4.2.0/RRDtool 1.2.19(same version)
The rrd-code has pretty changed since 4.2.0 and I don't really seewhat code is involved to try debugging this.
Any help appreciated!

Dominique


This is a more general problem.
The data messages passed to hobbitd_rrd are truncated.

Debugging showed that messages are going correctly out of hobbitd butread incorrectly by hobbitd_channel.

Here below the debug output of hobbitd and hobbitd_channel with extraprintf lines to dump the messages.


------ hobbitd.log --------
2009-04-17 16:22:21 <- do_message/1
2009-04-17 16:22:21 -> do_message/1 (86 bytes): data blind.ifstat
2009-04-17 16:22:21 -> update_statistics
2009-04-17 16:22:21 <- update_statistics
2009-04-17 16:22:21 -> oksender
2009-04-17 16:22:21 <- oksender(1-a)
2009-04-17 16:22:21 ->handle_data
2009-04-17 16:22:21 -> posttochannel
2009-04-17 16:22:21 Posting message 2 to 1 readers
2009-04-17 16:22:21 <- posttochannel
2009-04-17 16:22:21 <-handle_data
2009-04-17 16:22:21 msg: data blind.ifstat
solaris
bge:0:bge0:obytes64     267829127
bge:0:bge0:rbytes64     1208836563
2009-04-17 16:22:21 <- do_message/1
2009-04-17 16:22:21 -> do_message/1 (104 bytes): data blind.vmstat
2009-04-17 16:22:21 -> update_statistics
2009-04-17 16:22:21 <- update_statistics
2009-04-17 16:22:21 -> oksender
2009-04-17 16:22:21 <- oksender(1-a)
2009-04-17 16:22:21 ->handle_data
2009-04-17 16:22:21 -> posttochannel
2009-04-17 16:22:21 Posting message 3 to 1 readers
2009-04-17 16:22:21 <- posttochannel
2009-04-17 16:22:21 <-handle_data
2009-04-17 16:22:21 msg: data blind.vmstat
solaris

0 0 0 11938312 10700752 3 19 0 0 0 0 0 2 2 2 0 343 2099 1006 12 97

2009-04-17 16:22:21 <- do_message/1
2009-04-17 16:22:21 -> do_message/1 (1315 bytes): data blind.iostatdisk


------- rrd-data.log --------
2009-04-17 16:22:21 Peer not up, flushing message queue
2009-04-17 16:22:21 Connecting to peer 0.0.0.0:0
2009-04-17 16:22:21 Peer is UP

2009-04-17 16:22:21 inbuf:@@data#2/blind|1239978141.731166|130.223.27.23||blind|ifstat|sunos|intraDevServ,adminSys

data blind.ifstat
solaris
bge:0:bge0:obytes64     267829127
bge:0:bge0:rbytes64     12088365
@@

2009-04-17 16:22:21 inbuf:@@data#3/blind|1239978141.731938|130.223.27.23||blind|vmstat|sunos|intraDevServ,adminSys

data blind.vmstat
solaris
 0 0 0 11938312 10700752 3 19 0 0  0  0  0  2  2  2  0  343 2099 1006  1  2
@@

The last value of ifstat and vmstat (1208836563,97) becomes 12088365 andNULL respectively.

Hope Henrick can help us to solve this issue.

Dominique

To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk


Finally...found the issue in hobbitd.c
Patch hobbitd.patch is attached.

Installation
------------
Place in top Xymon install dir. and patch with:
# patch -p0 < hobbitd.patch
# gmake
Copy hobbitd to your install bin dir.


Dominique

--- hobbitd/hobbitd.c.dist	Mon Apr 20 15:51:44 2009
+++ hobbitd/hobbitd.c	Mon Apr 20 19:25:06 2009
@@ -1312,7 +1312,7 @@
 	if (msg) buflen += strlen(msg); else dbgprintf("  msg is NULL\n");
 	if (classname) buflen += strlen(classname);
 	if (pagepath) buflen += strlen(pagepath);
-	buflen += 4;
+	buflen += 6;
 
 	chnbuf = (char *)malloc(buflen);
 	snprintf(chnbuf, buflen, "%s|%s|%s|%s|%s\n%s",

References:
- Xymon-4.3.0-beta1: incorrect value for "count" rrd
  - From: Dominique Frise
- Re: [hobbit] Xymon-4.3.0-beta1: hobbit_rrd data msgs truncated (was: rrd_data incorrect value for "count" rrd)
  - From: Dominique Frise

Prev by Date: bbd:Service unavailable (Connection refused) for xymon client
Next by Date: [hobbit] BBALPHAMSG max size?
Previous by thread: Re: [hobbit] Xymon-4.3.0-beta1: hobbit_rrd data msgs truncated (was: rrd_data incorrect value for "count" rrd)
Next by thread: xymon-4.2.2 and rrdtoolrpm problem
Index(es):
- Date
- Thread