[hobbit] server fails to receive all of client message
Rodolfo Pilas
rodolfo at pilas.net
Thu May 22 16:37:05 CEST 2008
Adam, take a look at:
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/FAQ#Q._How_do_I_fix_.22Oversize_status_msg_from_192.168.1.31_for_test.my.com:ports_truncated_.28n.3D508634.2C_limit.3D262144.29.22
Adam Goryachev escribió:
> Adam Goryachev wrote:
>> Anyway, the problem is that approximately since then, a number of client
>> reports are not completely received. Sometimes some of the ps output is
>> truncated, sometimes the ports sections is truncated, etc. This leads to
>> false positive alerts (ie, procs goes red because some monitored procs
>> are not running since they were after the truncated section).
>
>> I've increased the timeout on the hobbitd (--timeout=60) but this
>> doesn't seem to have helped. The only common factor between the clients
>> which have this problem are:
>
>> 1) Most of them are running bbproxy and passing status messages from a
>> number of clients.
>> 2) The rest of them are on very slow connections, or frequently very
>> busy connections.
>
>
> I have made some 'progress' of sorts.
>
> I've increased the MAX values as I was getting some "Oversize ...
> truncated" messages in my log file. I then went home thinking "Great, I
> managed to solve this one thing today at least". Except, I started
> getting messages a few hours later.
>
> So after further investigation, I've decided I really can't work out
> what is happening, and why it isn't working. I've enabled debug output
> from bbproxy, but I don't really know what it all means.
>
> I can see that if I set bbproxy to only forward messages to 127.0.0.1
> the local hobbit server gets all the data correctly. If I add the remote
> server, then some things don't work properly. Since it is likely all a
> big jumbled mess by now, I'll post a few sections of config files, and
> hopefully someone will notice my stupid mistake (or multiple mistakes)...
>
> I have a network 10.x.x.x which has a hobbit server at 10.30.10.9, all
> client machines report to 10.30.10.9 as the BBDISPLAY/BBPAGER (most are
> windows PC's using the BB windows client), one is a linux hobbit-client
> and of course 10.30.10.9 is a hobbit client (plus a couple of old ext
> scripts using the old BB env). I think all this is working fine, since
> nothing goes randomly purple/red.
>
> 10.30.10.9 is behind NAT but has complete access to the internet.
>
> I have a remote server behind a NAT router which has port 1984 port
> forwarded to it. It is receiving reports from around 20 other hobbit
> client machines perfectly, so I don't suspect the NAT router/hobbit
> config itself.
>
> Some config from 10.30.10.9:
>
> hobbitserver.cfg:
> BBSERVERIP="127.0.0.1"
> BBDISP="127.0.0.1"
> BBDISPLAYS=""
> MAXLINE="32768"
>
> hobbitclient.cfg
> BBDISP="10.30.10.9"
> BBDISPLAYS=""
> BB="$BBHOME/bin/bb --debug --timeout=60"
> MAXLINE="32768"
>
> hobbitlaunch.cfg
> [hobbitd]
> ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
> CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
> --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
> --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
> --admin-senders=127.0.0.1,$BBSERVERIP --store-clientlogs=!msgs
> --listen=127.0.0.1
>
>
> [bbproxy]
> ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
> CMD $BBHOME/bin/bbproxy --hobbitd
> --bbdisplay=123.234.456.567,127.0.0.1 --listen=10.30.10.9
> --report=$MACHINE.bbproxy --no-daemon --timeout=30
> --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details
> CMD $BBHOME/bin/bbproxy --hobbitd --bbdisplay=127.0.0.1
> --listen=10.30.10.9 --report=$MACHINE.bbproxy --no-daemon --timeout=30
> --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details
> LOGFILE $BBSERVERLOGS/bbproxy.log
>
> [hobbitclient]
> ENVFILE /usr/lib/hobbit/client/etc/hobbitclient.cfg
> NEEDS hobbitd
> CMD /usr/lib/hobbit/client/bin/hobbitclient.sh
> LOGFILE $BBSERVERLOGS/hobbitclient.log
> INTERVAL 5m
>
>
> On the remote hobbit server with the public IP I have:
> hobbitserver.cfg
> BBSERVERIP="192.168.2.6"
> BBDISP="192.168.2.6"
> BBDISPLAYS=""
> MAXLINE="32768"
> MAXMSG_STATUS="1024"
> MAXMSG_CLIENT="1024"
> MAXMSG_DATA="512"
>
> hobbitlaunch.cfg
> [hobbitd]
> HEARTBEAT
> ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
> CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
> --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
> --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
> --admin-senders=127.0.0.1,$BBSERVERIP
> --maint-senders=127.0.0.1,$BBSERVERIP -www-senders=127.0.0.1,$BBSERVERIP
> --store-clientlogs=!msgs --timeout=60
>
> Any suggestions as to what is going wrong would be really appreciated.
>
> BTW, bbnet tests from the 10.30.10.9 host are not submitted to the
> bbproxy at all because of the BBDISP setting in the hobbitserver.cfg,
> but if I change this to point to 10.30.10.9 then it seems to break the
> web interface. I'm not really too concerned about this right now though....
>
> Thanks for any tips/pointers/etc
>
> Regards,
> Adam
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20080522/b3191630/attachment.sig>
More information about the Xymon
mailing list