[hobbit] Client interval question

Tue Dec 13 18:24:54 CET 2005

> On Mon, Dec 12, 2005 at 01:12:20PM -0600, Jeff Newman wrote:
>>
>> I wanted to move from a 5 minute interval on all my clients to a 1  
>> minute
>> interval.
>>

In all my years of Systems Administration, things that run every  
minute all the time usually end up being a "Bad Idea".

How will a smaller sampling period improve the service you provide?

>> the script is "vmstat 300 2" So do I need to update that to  
>> reflect 1 minute
>> as well (i.e. vmstat 60 2)?
>> Or is this by design? Are there others that might need to change  
>> that I
>> don't know about? Is the way I am going about this wrong?
>
> That's an interesting question :-)
>

My job requires data be useful, not just interesting.  That is not to  
say there aren't jobs were useful is good enough.

> The graph DB's that vmstat feeds data into (the RRD files) are
> constructed in such a way that a 5-minute interval is what makes
> sense. So running them with anything else really just a waste of
> ressources.

With the stock larrd/hobbit RRD definitions you are correct.  He'll  
only use one of the five, and whine about the timestamp of the other  
four.

>
> (I do have a patch here from a user that would allow you to configure
> the RRD files for different data-collection frequencies, but that has
> not been merged yet - primarily due to me being overloaded).

The design goal of larrd, (I can't speak for Henrik and hobbit/RRD)  
was capacity planning and trending.  5m samples are  more than  
adequate for that activity.

IMO, sampling at a high frequency implies real-time performance  
analysis, and I've always felt that outside the scope of capacity  
planning and trending.  EG. We don't run sendmail in debug all the  
time . . . .

All that being said, those long term trends are very helpful for  
problem resolution.  One can compare a single 5m sample against an  
aggregate of 5m samples and determine if things are 'normal'.  But  
the art of comparing all the activity within a single 5m sample for  
normal is very very difficult.

>
> So no - you shouldn't change that vmstat command. But it is bad design
> on my part to assume that the client polling period would always be
> 5 minutes - it's perfectly valid to run the client checks differently.

That's my design you inherited and because of the complexity of the  
parts, I think it is a very solid design.  To become flexible enough  
to handle different sampling rates, the server would need to know the  
frequency of the tests.  And then changing the RRD in the future is  
'almost' impossible (very difficult at the least).  And I've never  
seen what happens to 1.5 years of data when you start messing with  
the RRD.

In the end, I think you'd get the worst of both worlds.

> I'll think about what's the most sensible solution. It probably  
> would be
> to only start the vmstat command if one isn't running; that does  
> assume
> that you will run the client scripts *at least once* every 5 minutes.
>

I disagree.  If real-time performance analysis is needed, I would  
pick other tools --  "vmstat 5"  works for me;)  Or construct/fork  
the client agent specifically designed for such a task, and run it on  
an as-needed basis.

Then try and decide for real time perf analysis if the sampling rate  
should be 5s or 1m ;)

scott