F5 BigIP [Devmon] Response to Large or cant Fragment or configuration?

Bill Richardson wrichardson at llbean.com
Fri Mar 28 12:56:23 CET 2008


 I sent this to the devmon mailing list and thought it might be worth
trying the hobbit list to see if anyone was using Hobbit and or Devmon
to monitor F5's.

Any thoughts/help?

-----Original Message-----
From: Bill Richardson 
Sent: Wednesday, March 26, 2008 7:46 AM
To: 'Buchan Milne'
Cc: devmon-support at lists.sourceforge.net
Subject: RE: [Devmon] Response to Large or cant Fragment or
configuration?

 Thanks for your thoughts...

Just for fun I add a few of our 4500 Cisco switches that are fully
populated and Devmon is working without any issues.

I then took another set of bigip F5's that was working with Devmon and
increased the number of Pool members one at a time. "The F5's that
worked had 80 Pool memebers" When I got to 97 pool members thats when I
see "No SNMP data found for PoolMemberStatusReason" in the Devmon log. I
delete one of the pool members Devmon starts to work again. "The F5's
that Devmon doesn't work has 128 pool members"

So this is a test for anyone that is using Devmon to monitor bigip's is
to see how many pool members they have. Seems like the braking point is
around the 97 mark.

I have tcpdumps of a good working "96 members" and the failure "97
members" I could send you but I would like to sanitize them first "some
how?"



-----Original Message-----
From: Buchan Milne [mailto:bgmilne at staff.telkomsa.net]
Sent: Tuesday, March 25, 2008 9:24 AM
To: devmon-support at lists.sourceforge.net
Cc: Bill Richardson
Subject: Re: [Devmon] Response to Large or cant Fragment or
configuration?

On Tuesday 25 March 2008 14:24:03 Bill Richardson wrote:
> I'm using devmon to monitor some bigip F5. When devmon is first 
> started "the first poll" all works. During every poll after on one of 
> my F5's under the pool "PoolMemberStatusReason" is not being populated

> with data. "Agian on the first poll it is populted"
>
> Looking in the devmon.log is this this message over and over every
> minute:
>  "No SNMP data found for PoolMemberStatusReason on bigip1500"
>
> Starting devmon with the "-f " I see this starting after the 1st poll 
> "First poll is fine"
> ----------------------------------------------------------------------
> --
> ---------------------
>  ./devmon -f
>
> SNMP Error:
> Error decoding response PDU:
>   Expected length 9855, got 7996
>     %{%i%s%*{%i%i%i%{%@
>       ^
> SNMPv2c_Session (remote host: "192.168.1.10" [192.168.1.10].161)
>                    community: "MyPublic"
>                   request ID: -1368482924
>                  PDU bufsize: 8000 bytes
>                      timeout: 5s
>                      retries: 3
>                      backoff: 1)
>  at /usr/local/devmon-0.3.0-rc1/modules/dm_snmp.pm line 540 Use of 
> uninitialized value in string ne at 
> /usr/lib/perl5/site_perl/5.8.8/SNMP_Session.pm line 871, <$__ANONIO__>

> line 16.
> ----------------------------------------------------------------------
> --
> ------------------------
>
> Based on this I upgraded from SNMP_Session 1.08 to 1.12 and still see 
> the same issue.
>
> I did some tcpdumps and was thinking it may have been the size of the 
> response....
>
> Looking at the dumps I noticed that the first poll response in the 
> trace has a total length of only 988, while the second poll response 
> has a length of 1514.
>
> Additionally I noticed that second poll packet has the fragment bit
set.
> meaning there should be more packets to follow.  I wonder if the perl 
> script is having trouble handling a fragmented packet.  This might 
> have something to do with the message that I'm seeing that was 
> complaining about the expected length of the packet.
>
> So, thinking it has something to do with the size of the response, as 
> a test I went in to the bigip box (the standby box) and I deleted a 
> bunch of the pools - like half of them or more.  Once I did that, I 
> went in and checked devmon again.  Once it refreshed it was able to 
> load the pool info for this box.  This proves that it is definitely 
> related to the size of the response that the SNMP bulkget is receiving

> that is causing the error.
>
> My guess is that is that the first request is working because of that 
> MAX REPETITION setting.  If you look at that first response packet, it

> returns 12 SNMP items.  At some point between that first request and 
> the second one, devmon must be learning that there are really 129 
> items here, not 12.  Then when it tries on the second attempt to pull 
> in that many item responses they cause the error.  Whether it is the 
> packet fragmenting that is the cause or something else related to the 
> size of the response, I'm not sure.
>
> Any help with this one... I have looked and looked dont see this 
> reported on the list in the past..

I'm currently running perl-SNMP_Session 1.08, and monitoring some cisco
devices with more than 180 items per test. It might be best if you could
send captured files from tcpdump or wireshark.

However, this looks more like an issue with SNMP_Session, than with
devmon.

But, I'll try and investigate.

Regards,
Buchan



More information about the Xymon mailing list