[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Client interval question



This helps *immensely*. Now we'll be able to justify shiny new gear to management to reliably provide an IT infrastructure capable of meeting the long term growth of trade volumes.
On Dec 23, 2005, at 10:19 AM, Jeff Newman wrote:



servers. a graph would have little data until the stock market opens, then the floodgates open :-)
The graph then fluctuates with another surge at market close.



gotcha


The interval being at 1 minute for specifically CPU and network is important to us
for capacity planning purposes because during, say, market open, there are huge peaks
that a 5m interval doesn't catch. We need to plan capacity based around those spikes, as those are indicative of future market trends in stock volume. It's not that the 5m interval does nothing, indeed it is helpful, but from a business perspective, a 1m interval allows us to plan capacity because it helps us catch the spikes that we want to see.

Absolutely. I am glad to hear you aware you must plan for the peaks.

Busy doesn't mean slow.

The server stats are generally only 1/2 the equation. They are the impact on the machine. Ideally, for these types of situations, you are also able to measure the load E.G. trade volumes and their average execution times.

Knowing the RPMs of your motor doesn't tell you you MPH. If you could see/prove that when CPU is 100% execution times can grow outside of SLAs, its easier to convince management you need a bigger/ better environment and/or testing/QA/integration.

I hear there's a few nickels on Wall Street ;)


So something like a low-interval cpu/network column would be beneficial. Those tests could
use seperate rrd files etc...

I am still going to argue this isn't the right way to measure the data in your environment to provide the information you are looking for.


1) RRD makes the presumption the older data gets, the less important it is. In your case that is *not true*. Each 'peak' is a set of data where the granularity needs to be preserved. So even if the RRA gets configured to keep 1m samples, which might help 'see' the last 2 days or so of peaks, it won't help when you want to review the data set of Black Monday. Those peaks will have been averaged down.

2) One cannot assume causation of UNIX statistics and performance in the business environment. If you need to know your servers will handle 5 million trades in 5 minutes, you need throw 5 million trades at the boxes and see what happens. "If it ain't tested, it doesn't work."

3) When environments reach bottlenecks, it's impossible to say what the real peak is. If your CPU is at 100%, one cannot know (without testing) what the real demand for CPU is . . . .

4)  It's the always the code/SQL/CICS anyway ;)


I recently integrated mrtg into hobbit. I assume that the 5m interval "issue" (not really an issue I know) exists with it as well since it utilizes the same rrd structure? Or can I set the interval of mrtg to be 1 minute? That would solve my networking interval problem.

But that is only one sample per minute. For your application, you need something *much* more granular.



Anyway, I hope I have explained the business reason well enough, feel free to ask any questions. I feel that while not all circumstances are ideal for a 1m polling sample, there
are some situations where this is ideal.

You have legitimate business needs for sure, and an idea for a feature which would be very *useful*.


A high interval/sampling for 'stress testing' impact with the data being preserved.

That would be a great addition hobbit. I am not sure if RRD is the right backend, but it might work if the solution is clever.

I'll let it rattle around . . . .

scott