[hobbit] Client interval question

Fri Dec 23 18:16:49 CET 2005

This helps *immensely*.  Now we'll be able to justify shiny new gear  
to management to reliably provide an IT infrastructure capable of  
meeting the long term growth of trade volumes.
On Dec 23, 2005, at 10:19 AM, Jeff Newman wrote:

> servers. a graph would have little data until the stock market  
> opens, then the floodgates open :-)
> The graph then fluctuates with another surge at market close.
>

gotcha

>
> The interval being at 1 minute for specifically CPU and network is  
> important to us
> for capacity planning purposes because during, say, market open,  
> there are huge peaks
> that a 5m interval doesn't catch. We need to plan capacity based  
> around those spikes, as those are indicative of future market  
> trends in stock volume. It's not that the 5m interval does nothing,  
> indeed it is helpful, but from a business perspective, a 1m  
> interval allows us to plan capacity because it helps us catch the  
> spikes that we want to see.

Absolutely.  I am glad to hear you aware you must plan for the peaks.

Busy doesn't mean slow.

The server stats are generally only 1/2 the equation.  They are the  
impact on the machine.  Ideally, for these types of situations, you  
are also able to measure the load E.G. trade volumes and their  
average execution times.

Knowing the RPMs of your motor doesn't tell you you MPH.  If you  
could see/prove that when CPU is 100% execution times can grow  
outside of SLAs, its easier to convince management you need a bigger/ 
better environment and/or testing/QA/integration.

I hear there's a few nickels on Wall Street ;)

>
> So something like a low-interval cpu/network column would be  
> beneficial. Those tests could
> use seperate rrd files etc...

I am still going to argue this isn't the right way to measure the  
data in your environment to provide the information you are looking for.

1)  RRD makes the presumption the older data gets, the less important  
it is.  In your case that is *not true*.  Each 'peak' is a set of  
data where the granularity needs to be preserved.  So even if the RRA  
gets configured to keep 1m samples, which might help 'see'  the last  
2 days or so of peaks,  it won't help when you want to review the  
data set of Black Monday.  Those peaks will have been averaged down.

2)  One cannot assume causation of UNIX statistics and performance in  
the business environment.  If you need to know your servers will  
handle 5 million trades in 5 minutes, you need throw 5 million trades  
at the boxes and see what happens.  "If it ain't tested, it doesn't  
work."

3)  When environments reach bottlenecks, it's impossible to say what  
the real peak is.  If your CPU is at 100%, one cannot know (without  
testing) what the real demand for CPU is . . . .

4)  It's the always the code/SQL/CICS anyway ;)

>
> I recently integrated mrtg into hobbit. I assume that the 5m  
> interval "issue" (not really an issue I know) exists with it as  
> well since it utilizes the same rrd structure? Or can I set the  
> interval of mrtg to be 1 minute? That would solve my networking  
> interval problem.

But that is only one sample per minute.  For your application, you  
need something *much* more granular.

>
> Anyway, I hope I have explained the business reason well enough,  
> feel free to ask any questions. I feel that while not all  
> circumstances are ideal for a 1m polling sample, there
> are some situations where this is ideal.

You have legitimate business needs for sure, and an idea for a  
feature which would be very *useful*.

A high interval/sampling for 'stress testing' impact with the data  
being preserved.

That would be a great addition hobbit.  I am not sure if RRD is the  
right backend, but it might work if the solution is clever.

I'll let it rattle around . . . .

scott