[hobbit] server fails to receive all of client message
    Rodolfo Pilas 
    rodolfo at pilas.net
       
    Thu May 22 16:37:05 CEST 2008
    
    
  
Adam, take a look at:
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/FAQ#Q._How_do_I_fix_.22Oversize_status_msg_from_192.168.1.31_for_test.my.com:ports_truncated_.28n.3D508634.2C_limit.3D262144.29.22
Adam Goryachev escribió:
> Adam Goryachev wrote:
>> Anyway, the problem is that approximately since then, a number of client
>> reports are not completely received. Sometimes some of the ps output is
>> truncated, sometimes the ports sections is truncated, etc. This leads to
>> false positive alerts (ie, procs goes red because some monitored procs
>> are not running since they were after the truncated section).
> 
>> I've increased the timeout on the hobbitd (--timeout=60) but this
>> doesn't seem to have helped. The only common factor between the clients
>> which have this problem are:
> 
>> 1) Most of them are running bbproxy and passing status messages from a
>> number of clients.
>> 2) The rest of them are on very slow connections, or frequently very
>> busy connections.
> 
> 
> I have made some 'progress' of sorts.
> 
> I've increased the MAX values as I was getting some "Oversize ...
> truncated" messages in my log file. I then went home thinking "Great, I
> managed to solve this one thing today at least". Except, I started
> getting messages a few hours later.
> 
> So after further investigation, I've decided I really can't work out
> what is happening, and why it isn't working. I've enabled debug output
> from bbproxy, but I don't really know what it all means.
> 
> I can see that if I set bbproxy to only forward messages to 127.0.0.1
> the local hobbit server gets all the data correctly. If I add the remote
> server, then some things don't work properly. Since it is likely all a
> big jumbled mess by now, I'll post a few sections of config files, and
> hopefully someone will notice my stupid mistake (or multiple mistakes)...
> 
> I have a network 10.x.x.x which has a hobbit server at 10.30.10.9, all
> client machines report to 10.30.10.9 as the BBDISPLAY/BBPAGER (most are
> windows PC's using the BB windows client), one is a linux hobbit-client
> and of course 10.30.10.9 is a hobbit client (plus a couple of old ext
> scripts using the old BB env). I think all this is working fine, since
> nothing goes randomly purple/red.
> 
> 10.30.10.9 is behind NAT but has complete access to the internet.
> 
> I have a remote server behind a NAT router which has port 1984 port
> forwarded to it. It is receiving reports from around 20 other hobbit
> client machines perfectly, so I don't suspect the NAT router/hobbit
> config itself.
> 
> Some config from 10.30.10.9:
> 
> hobbitserver.cfg:
> BBSERVERIP="127.0.0.1"
> BBDISP="127.0.0.1"
> BBDISPLAYS=""
> MAXLINE="32768"
> 
> hobbitclient.cfg
> BBDISP="10.30.10.9"
> BBDISPLAYS=""
> BB="$BBHOME/bin/bb --debug --timeout=60"
> MAXLINE="32768"
> 
> hobbitlaunch.cfg
> [hobbitd]
>         ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
>         CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
> --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
> --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
> --admin-senders=127.0.0.1,$BBSERVERIP --store-clientlogs=!msgs
> --listen=127.0.0.1
> 
> 
> [bbproxy]
>         ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
>         CMD $BBHOME/bin/bbproxy --hobbitd
> --bbdisplay=123.234.456.567,127.0.0.1 --listen=10.30.10.9
> --report=$MACHINE.bbproxy --no-daemon --timeout=30
> --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details
>         CMD $BBHOME/bin/bbproxy --hobbitd --bbdisplay=127.0.0.1
> --listen=10.30.10.9 --report=$MACHINE.bbproxy --no-daemon --timeout=30
> --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details
>         LOGFILE $BBSERVERLOGS/bbproxy.log
> 
> [hobbitclient]
>         ENVFILE /usr/lib/hobbit/client/etc/hobbitclient.cfg
>         NEEDS hobbitd
>         CMD /usr/lib/hobbit/client/bin/hobbitclient.sh
>         LOGFILE $BBSERVERLOGS/hobbitclient.log
>         INTERVAL 5m
> 
> 
> On the remote hobbit server with the public IP I have:
> hobbitserver.cfg
> BBSERVERIP="192.168.2.6"
> BBDISP="192.168.2.6"
> BBDISPLAYS=""
> MAXLINE="32768"
> MAXMSG_STATUS="1024"
> MAXMSG_CLIENT="1024"
> MAXMSG_DATA="512"
> 
> hobbitlaunch.cfg
> [hobbitd]
>         HEARTBEAT
>         ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
>         CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
> --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
> --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
> --admin-senders=127.0.0.1,$BBSERVERIP
> --maint-senders=127.0.0.1,$BBSERVERIP -www-senders=127.0.0.1,$BBSERVERIP
> --store-clientlogs=!msgs --timeout=60
> 
> Any suggestions as to what is going wrong would be really appreciated.
> 
> BTW, bbnet tests from the 10.30.10.9 host are not submitted to the
> bbproxy at all because of the BBDISP setting in the hobbitserver.cfg,
> but if I change this to point to 10.30.10.9 then it seems to break the
> web interface. I'm not really too concerned about this right now though....
> 
> Thanks for any tips/pointers/etc
> 
> Regards,
> Adam
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20080522/b3191630/attachment.sig>
    
    
More information about the Xymon
mailing list