[hobbit] fping tuning

Schwimmer, Eric E *HS EES2Y at hscmail.mcc.virginia.edu
Thu Apr 27 15:17:24 CEST 2006



> If it takes twice as long to ping 600 things as it does 300 things,
> isn't that to be expected?  After all, you are pinging twice as many
> items.  

If you ping all the hosts in paralell (aka all ICMP replies are sent
near simultaenously) then your test latentcy should really only be
limited by the speed of your cpu as well as that of your network, not
by how many hosts you are pinging.

> I don't think this is "geometric" but linear.

My bad.  I was never good at math :)

> But it is hard to calculate, too, because some of the IP addresses do
> not respond.  If a host responds, then the pinger is free to 
> move to the next one in the list.  Otherwise it has to go through the 
> "time out and retry" dance.  So a non-responsive host could cause 5 or
> 6 seconds in delay until the pinger decides it is down and moves on.  
> Since Fping runs things in parallel, it is the luck of the draw 
> regarding which stream might get bogged down.

I originally thought this might be part of the problem, so I wrote a
script
that went through the fping output and only included IP addresses that
had < 10ms response time, which was over 95% of the addresses from the
original list.  Running fping again on this new list didn't change
the behaviour much.  It was still taking ~35 seconds to poll 1300
devices.

> What is your ping interval?  Pinging 1400 IP's in 40 seconds sounds
> pretty good to me -- you have a lot of room for growth.  (Big Brother
> can't do this without some help)  That is about 35 IPs per second.
> Round down to 30 per second, multiply by 300, and you could possibly
> monitor 9,000 IP's with this one server in a five minute 
> window!  After all, you only need to get to them all in the cycle 
> before circling around and hitting them again.  Some of the pay ware 
> management systems actually try to space out their activity through 
> the polling cycle so they don't hog the network themselves.

Our poll interval is 60 seconds; its true that we do have room for
growth, but at the rate that we are adding devices, we won't be able
to grow much longer without exceeding the poll interval :)  I'm just
hoping for an 'easy' fix to fping that will make everything better.

-Eric



More information about the Xymon mailing list