[hobbit] server fails to receive all of client message

Adam Goryachev mailinglists at websitemanagers.com.au
Mon May 5 17:46:36 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adam Goryachev wrote:
> Anyway, the problem is that approximately since then, a number of client
> reports are not completely received. Sometimes some of the ps output is
> truncated, sometimes the ports sections is truncated, etc. This leads to
> false positive alerts (ie, procs goes red because some monitored procs
> are not running since they were after the truncated section).
> 
> I've increased the timeout on the hobbitd (--timeout=60) but this
> doesn't seem to have helped. The only common factor between the clients
> which have this problem are:
> 
> 1) Most of them are running bbproxy and passing status messages from a
> number of clients.
> 2) The rest of them are on very slow connections, or frequently very
> busy connections.


I have made some 'progress' of sorts.

I've increased the MAX values as I was getting some "Oversize ...
truncated" messages in my log file. I then went home thinking "Great, I
managed to solve this one thing today at least". Except, I started
getting messages a few hours later.

So after further investigation, I've decided I really can't work out
what is happening, and why it isn't working. I've enabled debug output
from bbproxy, but I don't really know what it all means.

I can see that if I set bbproxy to only forward messages to 127.0.0.1
the local hobbit server gets all the data correctly. If I add the remote
server, then some things don't work properly. Since it is likely all a
big jumbled mess by now, I'll post a few sections of config files, and
hopefully someone will notice my stupid mistake (or multiple mistakes)...

I have a network 10.x.x.x which has a hobbit server at 10.30.10.9, all
client machines report to 10.30.10.9 as the BBDISPLAY/BBPAGER (most are
windows PC's using the BB windows client), one is a linux hobbit-client
and of course 10.30.10.9 is a hobbit client (plus a couple of old ext
scripts using the old BB env). I think all this is working fine, since
nothing goes randomly purple/red.

10.30.10.9 is behind NAT but has complete access to the internet.

I have a remote server behind a NAT router which has port 1984 port
forwarded to it. It is receiving reports from around 20 other hobbit
client machines perfectly, so I don't suspect the NAT router/hobbit
config itself.

Some config from 10.30.10.9:

hobbitserver.cfg:
BBSERVERIP="127.0.0.1"
BBDISP="127.0.0.1"
BBDISPLAYS=""
MAXLINE="32768"

hobbitclient.cfg
BBDISP="10.30.10.9"
BBDISPLAYS=""
BB="$BBHOME/bin/bb --debug --timeout=60"
MAXLINE="32768"

hobbitlaunch.cfg
[hobbitd]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
- --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
- --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
- --admin-senders=127.0.0.1,$BBSERVERIP --store-clientlogs=!msgs
- --listen=127.0.0.1


[bbproxy]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        CMD $BBHOME/bin/bbproxy --hobbitd
- --bbdisplay=123.234.456.567,127.0.0.1 --listen=10.30.10.9
- --report=$MACHINE.bbproxy --no-daemon --timeout=30
- --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details
        CMD $BBHOME/bin/bbproxy --hobbitd --bbdisplay=127.0.0.1
- --listen=10.30.10.9 --report=$MACHINE.bbproxy --no-daemon --timeout=30
- --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details
        LOGFILE $BBSERVERLOGS/bbproxy.log

[hobbitclient]
        ENVFILE /usr/lib/hobbit/client/etc/hobbitclient.cfg
        NEEDS hobbitd
        CMD /usr/lib/hobbit/client/bin/hobbitclient.sh
        LOGFILE $BBSERVERLOGS/hobbitclient.log
        INTERVAL 5m


On the remote hobbit server with the public IP I have:
hobbitserver.cfg
BBSERVERIP="192.168.2.6"
BBDISP="192.168.2.6"
BBDISPLAYS=""
MAXLINE="32768"
MAXMSG_STATUS="1024"
MAXMSG_CLIENT="1024"
MAXMSG_DATA="512"

hobbitlaunch.cfg
[hobbitd]
        HEARTBEAT
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
- --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
- --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
- --admin-senders=127.0.0.1,$BBSERVERIP
- --maint-senders=127.0.0.1,$BBSERVERIP -www-senders=127.0.0.1,$BBSERVERIP
- --store-clientlogs=!msgs --timeout=60

Any suggestions as to what is going wrong would be really appreciated.

BTW, bbnet tests from the 10.30.10.9 host are not submitted to the
bbproxy at all because of the BBDISP setting in the hobbitserver.cfg,
but if I change this to point to 10.30.10.9 then it seems to break the
web interface. I'm not really too concerned about this right now though....

Thanks for any tips/pointers/etc

Regards,
Adam
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIHyvcGyoxogrTyiURAhpyAKCsnO4px+b4Ml04yjzZvXgFxeuaogCeKwy6
KwOEboPhIXFb4YVgdA0ndlk=
=T5Lc
-----END PGP SIGNATURE-----



More information about the Xymon mailing list