Dominique Frise wrote:
Hi,
We track "surgemail" processes using following rule in
hobbit-clients.cfg:
HOST=xyz
PROC ./surgemail min=0 TRACK=surgemail
The ps listing in msg.xyz.txt reports 315 "./surgemail" processes,
while the rrd graph only shows ~30 processes.
Here the last corresponding dataset of processes.surgemail.rrd file
(after flushing the cache by stopping Xymon):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
<!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
<step> 300 </step> <!-- Seconds -->
<lastupdate> 1239775972 </lastupdate> <!-- 2009-04-15 08:12:52
CEST -->
<ds>
<name> count </name>
<type> GAUGE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> 0.0000000000e+00 </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> 30 </last_ds>
<value> 5.1600000000e+03 </value>
<unknown_sec> 0 </unknown_sec>
</ds>
<!-- Round Robin Archives --> <rra>
We tried to let Xymon recreate a fresh rrd without success.
The same configuration was working with Hobbit-4.2.0/RRDtool 1.2.19
(same version)
The rrd-code has pretty changed since 4.2.0 and I don't really see
what code is involved to try debugging this.
Any help appreciated!
Dominique
This is a more general problem.
The data messages passed to hobbitd_rrd are truncated.
Debugging showed that messages are going correctly out of hobbitd but
read incorrectly by hobbitd_channel.
Here below the debug output of hobbitd and hobbitd_channel with extra
printf lines to dump the messages.
------ hobbitd.log --------
2009-04-17 16:22:21 <- do_message/1
2009-04-17 16:22:21 -> do_message/1 (86 bytes): data blind.ifstat
2009-04-17 16:22:21 -> update_statistics
2009-04-17 16:22:21 <- update_statistics
2009-04-17 16:22:21 -> oksender
2009-04-17 16:22:21 <- oksender(1-a)
2009-04-17 16:22:21 ->handle_data
2009-04-17 16:22:21 -> posttochannel
2009-04-17 16:22:21 Posting message 2 to 1 readers
2009-04-17 16:22:21 <- posttochannel
2009-04-17 16:22:21 <-handle_data
2009-04-17 16:22:21 msg: data blind.ifstat
solaris
bge:0:bge0:obytes64 267829127
bge:0:bge0:rbytes64 1208836563
2009-04-17 16:22:21 <- do_message/1
2009-04-17 16:22:21 -> do_message/1 (104 bytes): data blind.vmstat
2009-04-17 16:22:21 -> update_statistics
2009-04-17 16:22:21 <- update_statistics
2009-04-17 16:22:21 -> oksender
2009-04-17 16:22:21 <- oksender(1-a)
2009-04-17 16:22:21 ->handle_data
2009-04-17 16:22:21 -> posttochannel
2009-04-17 16:22:21 Posting message 3 to 1 readers
2009-04-17 16:22:21 <- posttochannel
2009-04-17 16:22:21 <-handle_data
2009-04-17 16:22:21 msg: data blind.vmstat
solaris
0 0 0 11938312 10700752 3 19 0 0 0 0 0 2 2 2 0 343 2099 1006 1
2 97
2009-04-17 16:22:21 <- do_message/1
2009-04-17 16:22:21 -> do_message/1 (1315 bytes): data blind.iostatdisk
------- rrd-data.log --------
2009-04-17 16:22:21 Peer not up, flushing message queue
2009-04-17 16:22:21 Connecting to peer 0.0.0.0:0
2009-04-17 16:22:21 Peer is UP
2009-04-17 16:22:21 inbuf:
@@data#2/blind|1239978141.731166|130.223.27.23||blind|ifstat|sunos|intraDevServ,adminSys
data blind.ifstat
solaris
bge:0:bge0:obytes64 267829127
bge:0:bge0:rbytes64 12088365
@@
2009-04-17 16:22:21 inbuf:
@@data#3/blind|1239978141.731938|130.223.27.23||blind|vmstat|sunos|intraDevServ,adminSys
data blind.vmstat
solaris
0 0 0 11938312 10700752 3 19 0 0 0 0 0 2 2 2 0 343 2099 1006 1 2
@@
The last value of ifstat and vmstat (1208836563,97) becomes 12088365 and
NULL respectively.
Hope Henrick can help us to solve this issue.
Dominique
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk