[hobbit] Xymon-4.3.0-beta1: hobbit_rrd data msgs truncated

Dominique Frise dominique.frise at unil.ch
Mon Apr 20 19:41:01 CEST 2009


Dominique Frise wrote:
> Dominique Frise wrote:
>> Hi,
>>
>> We track "surgemail" processes using following rule in 
>> hobbit-clients.cfg:
>>
>> HOST=xyz
>>     PROC ./surgemail min=0 TRACK=surgemail
>>
>> The ps listing in msg.xyz.txt reports 315 "./surgemail" processes, 
>> while the rrd graph only shows ~30 processes.
>>
>> Here the last corresponding dataset of processes.surgemail.rrd file 
>> (after flushing the cache by stopping Xymon):
>>
>> <?xml version="1.0" encoding="utf-8"?>
>> <!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
>> <!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
>>         <step> 300 </step> <!-- Seconds -->
>>         <lastupdate> 1239775972 </lastupdate> <!-- 2009-04-15 08:12:52 
>> CEST -->
>>
>>         <ds>
>>                 <name> count </name>
>>                 <type> GAUGE </type>
>>                 <minimal_heartbeat> 600 </minimal_heartbeat>
>>                 <min> 0.0000000000e+00 </min>
>>                 <max> NaN </max>
>>
>>                 <!-- PDP Status -->
>>                 <last_ds> 30 </last_ds>
>>                 <value> 5.1600000000e+03 </value>
>>                 <unknown_sec> 0 </unknown_sec>
>>         </ds>
>>
>> <!-- Round Robin Archives -->   <rra>
>>
>> We tried to let Xymon recreate a fresh rrd without success.
>> The same configuration was working with Hobbit-4.2.0/RRDtool 1.2.19 
>> (same version)
>>
>> The rrd-code has pretty changed since 4.2.0 and I don't really see 
>> what code is involved to try debugging this.
>> Any help appreciated!
>>
>> Dominique
>>
> 
> This is a more general problem.
> The data messages passed to hobbitd_rrd are truncated.
> 
> Debugging showed that messages are going correctly out of hobbitd but 
> read incorrectly by hobbitd_channel.
> 
> Here below the debug output of hobbitd and hobbitd_channel with extra 
> printf lines to dump the messages.
> 
> ------ hobbitd.log --------
> 2009-04-17 16:22:21 <- do_message/1
> 2009-04-17 16:22:21 -> do_message/1 (86 bytes): data blind.ifstat
> 2009-04-17 16:22:21 -> update_statistics
> 2009-04-17 16:22:21 <- update_statistics
> 2009-04-17 16:22:21 -> oksender
> 2009-04-17 16:22:21 <- oksender(1-a)
> 2009-04-17 16:22:21 ->handle_data
> 2009-04-17 16:22:21 -> posttochannel
> 2009-04-17 16:22:21 Posting message 2 to 1 readers
> 2009-04-17 16:22:21 <- posttochannel
> 2009-04-17 16:22:21 <-handle_data
> 2009-04-17 16:22:21 msg: data blind.ifstat
> solaris
> bge:0:bge0:obytes64     267829127
> bge:0:bge0:rbytes64     1208836563
> 2009-04-17 16:22:21 <- do_message/1
> 2009-04-17 16:22:21 -> do_message/1 (104 bytes): data blind.vmstat
> 2009-04-17 16:22:21 -> update_statistics
> 2009-04-17 16:22:21 <- update_statistics
> 2009-04-17 16:22:21 -> oksender
> 2009-04-17 16:22:21 <- oksender(1-a)
> 2009-04-17 16:22:21 ->handle_data
> 2009-04-17 16:22:21 -> posttochannel
> 2009-04-17 16:22:21 Posting message 3 to 1 readers
> 2009-04-17 16:22:21 <- posttochannel
> 2009-04-17 16:22:21 <-handle_data
> 2009-04-17 16:22:21 msg: data blind.vmstat
> solaris
>  0 0 0 11938312 10700752 3 19 0 0  0  0  0  2  2  2  0  343 2099 1006 1  
> 2 97
> 2009-04-17 16:22:21 <- do_message/1
> 2009-04-17 16:22:21 -> do_message/1 (1315 bytes): data blind.iostatdisk
> 
> 
> ------- rrd-data.log --------
> 2009-04-17 16:22:21 Peer not up, flushing message queue
> 2009-04-17 16:22:21 Connecting to peer 0.0.0.0:0
> 2009-04-17 16:22:21 Peer is UP
> 2009-04-17 16:22:21 inbuf: 
> @@data#2/blind|1239978141.731166|130.223.27.23||blind|ifstat|sunos|intraDevServ,adminSys 
> 
> data blind.ifstat
> solaris
> bge:0:bge0:obytes64     267829127
> bge:0:bge0:rbytes64     12088365
> @@
> 
> 2009-04-17 16:22:21 inbuf: 
> @@data#3/blind|1239978141.731938|130.223.27.23||blind|vmstat|sunos|intraDevServ,adminSys 
> 
> data blind.vmstat
> solaris
>  0 0 0 11938312 10700752 3 19 0 0  0  0  0  2  2  2  0  343 2099 1006  1  2
> @@
> 
> 
> The last value of ifstat and vmstat (1208836563,97) becomes 12088365 and 
> NULL respectively.
> Hope Henrick can help us to solve this issue.
> 
> Dominique
> 
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
> 
> 

Finally...found the issue in hobbitd.c
Patch hobbitd.patch is attached.

Installation
------------
Place in top Xymon install dir. and patch with:
# patch -p0 < hobbitd.patch
# gmake
Copy hobbitd to your install bin dir.


Dominique
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hobbitd.patch
Type: text/x-patch
Size: 396 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20090420/edbdbc86/attachment.bin>


More information about the Xymon mailing list