[hobbit] bbtest-net to hobbitd problem

Olivier Beau olivier at qalpit.com
Mon Aug 1 15:41:47 CEST 2005


> what version of Hobbit ? And what OS/hardware are you running on ?

version 4.1.1, redhat3.0 on a fairly good server compaq (2x3Gh intel cpu)



> Is there an equivalent number of "Bogus/Timeout" messages reported in
> the Hobbit servers' "hobbitd" status column ? 

no,
i had 1 hobbitd report with a  "Bogus/Timeout =1" this morning
and over 50 bbtest-net reports with 1 or 2 whoops..



> Are there any unusual messages in the hobbitd.log file ?

nothing in hobbitd.log



> The timeout that bbtest-net hits is a 5 second timeout which is the
> default one used whenever a message is sent off to the Hobbit daemon.
> The 5 secs was chosen back when bbtest-net was sending to the Big
> Brother daemon, and considering that fact that Hobbit can generate much
> larger messages it might be worth a try to increase that timeout
> somewhat. Unfortunately, that one is set at compile-time and cannot be
> changed easily - so could you try editing the lib/sendmsg.h file and
> change the line
>     #define BBTALK_TIMEOUT 5
> to
>     #define BBTALK_TIMEOUT 15
> Then run "make clean; make" and as root "make install" to build and
> install the tools with the new timeout setting.
> 
> Also, on the Hobbit server it might be necessary to up the timeout on
> the receiver side - so add a "--timeout=30" to the hobbitd command in
> ~hobbit/server/etc/hobbitlaunch.cfg

ok, i've changed those to what you recommended (15 and 30)
up to now, bbtest-net doesnt whoops anymore



> > it looks like bbtest-net actually connected to hobbitd !
> > -> could bbtest-net re-open a connection and resend the affected statuses
> when a
> > oops happens ?
> 
> It's tricky. Basically these timeouts should not happen (especially not
> when we're connecting to "localhost"), so I'd rather try and figure out 
> why they happen.

yes, i understand and agree with you.
let me know if i can do anything on this.



one thing that seems pretty long in my bbtest-net report is "test result
transmitted" :



Statistics:
 Hosts total           :     1629
 Hosts with no tests   :        0
 Total test count      :     4511
 Status messages       :     4851
 Alert status msgs     :        0
 Transmissions         :      522

TIME SPENT
Event                                            Starttime          Duration
bbtest-net startup                       1122897713.037280                 -
Service definitions loaded               1122897713.040386          0.003106 
Tests loaded                             1122897713.568623          0.528237 
DNS lookups completed                    1122897723.673199         10.104576 
Test engine setup completed              1122897723.737976          0.064777 
TCP tests completed                      1122897747.000639         23.262663 
PING test completed (1569 hosts)         1122897792.655792         45.655153 
PING test results sent                   1122897795.920521          3.264729 
Test result collection completed         1122897795.921481          0.000960 
LDAP test engine setup completed         1122897795.921485          0.000004 
LDAP tests executed                      1122897795.921487          0.000002 
LDAP tests result collection completed   1122897795.921488          0.000001 
NTP tests executed                       1122897796.143392          0.221904 
DIG tests executed                       1122897796.399747          0.256355 
NSLOOKUP tests executed                  1122897796.534172          0.134425 
Test results transmitted                 1122897824.069917         27.535745 
bbtest-net completed                     1122897824.074708          0.004791 
TIME TOTAL                                                        111.037428 




--
Olivier Beau



More information about the Xymon mailing list