[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Client interval question

To: hobbit (at) hswn.dk
Subject: Re: [hobbit] Client interval question
From: Scott Walters <scott (at) PacketPushers.com>
Date: Thu, 15 Dec 2005 03:16:22 -0500
References: <941506840512121112u6c23b68cy1f1f59cd436af7b9@mail.gmail.com> <20051212215636.GB31187@hswn.dk> <139E0D6D-28B0-4576-A033-3525AD2970CA@PacketPushers.com> <941506840512141531n24e3e843pf5efc0e8b2d58800@mail.gmail.com>

First off, I know I can come off terse in e-mail, but they are not personal attacks.

It can be a bad idea sometimes, others not (for example, the reply from the person catching intermittant problems with BB running every minute)

Who ended up stating the anomaly *was* detected in 5m intervals, but only once every 13h instead of every hour. But I still don't understand how it will help *you*.

A smaller sampling period can show things in a more granular aspect. For example, a process kicks off and 5 minutes later you see 100 errors (im keeping things generic for illustrative purposes) Were those 100 errors in the first minute? the last? constantly throughout the 5 minutes?

The 5m averages over a week would be quite low compared so a single 5m plot. From that, one could extrapolate in the last 5m things have not been 'normal'.

Im not saying your wrong, simply pointing out that it's not as black and white as your making it.

And I am disagreeing with you ;) I've been watching the data in these graphs for many many years now, and I have yet to come across a situation where having a 1m sampling/graphing period would have helped me fix/improve something . . .

It's like a story problem with too much information, it makes coming up with the real answer harder in the end. Most people don't have time/enegry/brains to be able to sift all the data correctly. If if they do, the 5m samples are good enough.

Most people (including really smart people that are forgetful) can't deal with an auto-scaling y-axis.

Something being just interesting initially can sometimes uncover problems that you didn't see before.

Like I said, if you have job were interesting is worthwhile, wonderful. In my experience, most folks that are running the BB/ hobbit tools are involved in the operational aspects of infrastructure, not R&D.


> With the stock larrd/hobbit RRD definitions you are correct.  He'll
> only use one of the five, and whine about the timestamp of the other
> four.

Firstly, can you explain your comment in more detail?

RRD interpolates Time Series Data to put a value at a fixed interval. That is why you hardly ever see integers in the data. If you sample comes in at 299s, RRD interpolates what that value to what would have been at 300s. How this is done can be tuned. The default settings with the RRAs expect data to happen every 300s. RRD will only insert data one time within that interval.

Secondly, im confused as to why you would state that I would "whine" about anything when you have no basis for a conclusion to that effect. It seems to be a rather pointed comment in a discussion that hasn't involved the use of language that would dictate a response like that.


"He'll whine" meant rrdtool, not you:

ERROR: illegal attempt to update using time 1042731000 when last update time > is 1043099100 (minimum one second step)

That's whining in my book. Sorry you thought I was speaking about you.

That is a very good point you make. There is a difference between real-time analysis and capacity planning/trending. I don't however think that it is that far outside of hobbit's scope to try and leverage it for a more pointed analysis.

From a software development standpoint there is a lot to be said for: "Do one thing and do it well". If architecting the RRD framework for RTA breaks trending, bad idea.

My goal isn't to take every machine in my environment and make them into 1 minute sampling period machines. To have the ability to do so on a machine-by-machine basis could be useful


Which is why I proposed another client collector for this activity.


> That's my design you inherited and because of the complexity of the
> parts, I think it is a very solid design.

I don't think anyone is really questioning that.

You are questioning that. And that is fine. I don't take it personally you think there may be a better way. I know my way may not be the best, but I sure know exactly *why* I chose it.

Honestly, I don't claim to know anything about the way larrd and hobbit are coded in the slightest. There are difficulties to be sure, but part of having a community such as this is to foster ideas and innovation. Just because you don't think it's useful or that it's hard doesn't mean the same is true for everyone out there.

Ahhhhh, to the heart of the matter. Don't suggest ideas in a public forum if you are not prepared to defend them. Fostering ideas comes from intelligent discussions. I merely wanted to understand why you felt you needed a higher sampling rate from a business perspective.


scott

References:
- Client interval question
  - From: Jeff Newman
- Re: [hobbit] Client interval question
  - From: Henrik Stoerner
- Re: [hobbit] Client interval question
  - From: Scott Walters
- Re: [hobbit] Client interval question
  - From: Jeff Newman

Prev by Date: Re: [hobbit] Client interval question
Next by Date: Can bbproxy listen on several (but not all) interfaces ?
Previous by thread: Re: [hobbit] Client interval question
Next by thread: [hobbit] tru64 file system vanished.
Index(es):
- Date
- Thread