[hobbit] False alerts in hobbit

Adam Goryachev mailinglists at websitemanagers.com.au
Thu Nov 13 06:40:53 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Anna Jonna Armannsdottir wrote:
> On mán, 2008-11-10 at 18:11 +1100, Adam Goryachev wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I have been battling false alerts with hobbit for quite some time
>> (months or more), and am really starting to get quite frustrated.
>> (Mostly in that I tend to ignore my SMS messages because there are so
>> many FP's...
>>
>> Anyway, the fault is that the hobbit client reports get truncated, yet
>> the hobbit server uses the portion that it gets. This usually results in
>> the procs, ports, or both columns going red due to non-running
>> procs/non-open ports. In reality, the proc/port is fine, just the data
>> was truncated so the hobbit server couldn't find it.
>>
>> Initially I discovered my hobbit server was truncating some of this
>> data, so I increased the relevant variables:
>> MAXLINE="65535"
>> MAXMSG_STATUS="2048"
>> MAXMSG_CLIENT="2048"
>> MAXMSG_DATA="4096"
> 
> # Anna added 2008-09-08 because of lots of truncations. 
> MAXLINE="32768"
> MAXMSG_STATUS="1024"
> MAXMSG_DATA="1024"
> MAXMSG_CLIENT="2048"
> MAXMSG_NOTES="1024"

I'm missing the MAXMSG_NOTES but I don't use notes anyway.

> After I added this, the problem was solved. I found the 
> sizes from the truncations reported in the logs. 

I used to get log messages about truncations, but I don't anymore. After
I added my above settings the problem went away for a while, but now it
seems to have come back.

>> However, I still get many red alerts, and when I check, the log files do
>> not report any truncated or oversized messages. Also, when I examine the
>> "Client data available" from the red hobbit report, I find the size of
>> the message is nowhere near any value above, and in fact is always
>> different... Some reports that work are longer than reports that don't
>> work etc...
> 
> Your logs - do they not report truncated or oversized messages like 
> in the following message: 
> http://www.hswn.dk/hobbiton/2006/05/msg00176.html 

No, they don't.

>> It isn't 100%, but generally (more than 98%) the clients with the
>> problem are on bandwidth limited networks.
>>
>> I would appreciate if anyone can provide any tips on how to make things
>> more reliable?
>>
>> Options I have considered:
>> 1) Get hobbit to compress it's data, which reduces network load, and
>> hence should improve reliability.
>> 2) Add a "END" tag to the hobbit client data, and if the server doesn't
>> get the END tag then ignore the whole file (or re-request it)
>> 3) Switch to polling mode (which effectively does 1 && 2 I suppose)
>> 4) Try and track down what is causing this, and fix it...
>>
>> My hobbit server is behind a NAT router, so one possibility I have
>> considered is the NAT router is dropping the map before the end of the
>> TCP connection due to too many other connections or similar.
> 
> Have you considered setting up a Hobbit proxy. See:
> http://www.hswn.dk/hobbiton/2007/06/msg00080.html

One remote site is running a hobbit proxy, and this site also has
problems when forwarding the results to my hobbit server.

Here is the output from the bbproxy report:
bbproxy for Hobbit version 4.2.0

Proxy statistics

Incoming messages        :     623771 (0 msgs/second)
Outbound messages        :     385373

Incoming message distribution
- - Combo messages         :       4944
- - Status messages        :     607511
  Messages merged        :     518963
  Resulting combos       :     215412
- - Page messages          :          0
- - Other messages         :      14528

Proxy ressources
- - Connection table size  :          1
- - Buffer space           :          8 kByte

Timeout/failure details
- - reading from client    :        100
- - connecting to server   :       7012
- - sending to server      :          0
- - recovered              :          0
- - reading from server    :          0
- - sending to client      :          0

Average queue time       :          1.139

I really do think it looks like the TCP connection is being lost when
the client (or proxy) is attempting to send to the server. However, when
this happens the client doesn't retry, and the server attempts to use
the partially received message...

Further suggestions/comments?

Regards,
Adam

- --
Adam Goryachev
Website Managers
www.websitemanagers.com.au
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkbveUACgkQGyoxogrTyiVj8gCgpAap2jz3IOfdnP283VVuYdAr
QB0AoL+L9BFF+xajqfJfEX9ih3B5fXBq
=Ol6O
-----END PGP SIGNATURE-----



More information about the Xymon mailing list