[hobbit] Client interval question
Scott Walters
scott at PacketPushers.com
Tue Dec 13 18:24:54 CET 2005
> On Mon, Dec 12, 2005 at 01:12:20PM -0600, Jeff Newman wrote:
>>
>> I wanted to move from a 5 minute interval on all my clients to a 1
>> minute
>> interval.
>>
In all my years of Systems Administration, things that run every
minute all the time usually end up being a "Bad Idea".
How will a smaller sampling period improve the service you provide?
>> the script is "vmstat 300 2" So do I need to update that to
>> reflect 1 minute
>> as well (i.e. vmstat 60 2)?
>> Or is this by design? Are there others that might need to change
>> that I
>> don't know about? Is the way I am going about this wrong?
>
> That's an interesting question :-)
>
My job requires data be useful, not just interesting. That is not to
say there aren't jobs were useful is good enough.
> The graph DB's that vmstat feeds data into (the RRD files) are
> constructed in such a way that a 5-minute interval is what makes
> sense. So running them with anything else really just a waste of
> ressources.
With the stock larrd/hobbit RRD definitions you are correct. He'll
only use one of the five, and whine about the timestamp of the other
four.
>
> (I do have a patch here from a user that would allow you to configure
> the RRD files for different data-collection frequencies, but that has
> not been merged yet - primarily due to me being overloaded).
The design goal of larrd, (I can't speak for Henrik and hobbit/RRD)
was capacity planning and trending. 5m samples are more than
adequate for that activity.
IMO, sampling at a high frequency implies real-time performance
analysis, and I've always felt that outside the scope of capacity
planning and trending. EG. We don't run sendmail in debug all the
time . . . .
All that being said, those long term trends are very helpful for
problem resolution. One can compare a single 5m sample against an
aggregate of 5m samples and determine if things are 'normal'. But
the art of comparing all the activity within a single 5m sample for
normal is very very difficult.
>
> So no - you shouldn't change that vmstat command. But it is bad design
> on my part to assume that the client polling period would always be
> 5 minutes - it's perfectly valid to run the client checks differently.
That's my design you inherited and because of the complexity of the
parts, I think it is a very solid design. To become flexible enough
to handle different sampling rates, the server would need to know the
frequency of the tests. And then changing the RRD in the future is
'almost' impossible (very difficult at the least). And I've never
seen what happens to 1.5 years of data when you start messing with
the RRD.
In the end, I think you'd get the worst of both worlds.
> I'll think about what's the most sensible solution. It probably
> would be
> to only start the vmstat command if one isn't running; that does
> assume
> that you will run the client scripts *at least once* every 5 minutes.
>
I disagree. If real-time performance analysis is needed, I would
pick other tools -- "vmstat 5" works for me;) Or construct/fork
the client agent specifically designed for such a task, and run it on
an as-needed basis.
Then try and decide for real time perf analysis if the sampling rate
should be 5s or 1m ;)
scott
More information about the Xymon
mailing list