[hobbit] Client interval question
scott at PacketPushers.com
Fri Dec 23 18:16:49 CET 2005
This helps *immensely*. Now we'll be able to justify shiny new gear
to management to reliably provide an IT infrastructure capable of
meeting the long term growth of trade volumes.
On Dec 23, 2005, at 10:19 AM, Jeff Newman wrote:
> servers. a graph would have little data until the stock market
> opens, then the floodgates open :-)
> The graph then fluctuates with another surge at market close.
> The interval being at 1 minute for specifically CPU and network is
> important to us
> for capacity planning purposes because during, say, market open,
> there are huge peaks
> that a 5m interval doesn't catch. We need to plan capacity based
> around those spikes, as those are indicative of future market
> trends in stock volume. It's not that the 5m interval does nothing,
> indeed it is helpful, but from a business perspective, a 1m
> interval allows us to plan capacity because it helps us catch the
> spikes that we want to see.
Absolutely. I am glad to hear you aware you must plan for the peaks.
Busy doesn't mean slow.
The server stats are generally only 1/2 the equation. They are the
impact on the machine. Ideally, for these types of situations, you
are also able to measure the load E.G. trade volumes and their
average execution times.
Knowing the RPMs of your motor doesn't tell you you MPH. If you
could see/prove that when CPU is 100% execution times can grow
outside of SLAs, its easier to convince management you need a bigger/
better environment and/or testing/QA/integration.
I hear there's a few nickels on Wall Street ;)
> So something like a low-interval cpu/network column would be
> beneficial. Those tests could
> use seperate rrd files etc...
I am still going to argue this isn't the right way to measure the
data in your environment to provide the information you are looking for.
1) RRD makes the presumption the older data gets, the less important
it is. In your case that is *not true*. Each 'peak' is a set of
data where the granularity needs to be preserved. So even if the RRA
gets configured to keep 1m samples, which might help 'see' the last
2 days or so of peaks, it won't help when you want to review the
data set of Black Monday. Those peaks will have been averaged down.
2) One cannot assume causation of UNIX statistics and performance in
the business environment. If you need to know your servers will
handle 5 million trades in 5 minutes, you need throw 5 million trades
at the boxes and see what happens. "If it ain't tested, it doesn't
3) When environments reach bottlenecks, it's impossible to say what
the real peak is. If your CPU is at 100%, one cannot know (without
testing) what the real demand for CPU is . . . .
4) It's the always the code/SQL/CICS anyway ;)
> I recently integrated mrtg into hobbit. I assume that the 5m
> interval "issue" (not really an issue I know) exists with it as
> well since it utilizes the same rrd structure? Or can I set the
> interval of mrtg to be 1 minute? That would solve my networking
> interval problem.
But that is only one sample per minute. For your application, you
need something *much* more granular.
> Anyway, I hope I have explained the business reason well enough,
> feel free to ask any questions. I feel that while not all
> circumstances are ideal for a 1m polling sample, there
> are some situations where this is ideal.
You have legitimate business needs for sure, and an idea for a
feature which would be very *useful*.
A high interval/sampling for 'stress testing' impact with the data
That would be a great addition hobbit. I am not sure if RRD is the
right backend, but it might work if the solution is clever.
I'll let it rattle around . . . .
More information about the Xymon