[hobbit] Client interval question

Thu Dec 15 00:31:28 CET 2005

On 12/13/05, Scott Walters <scott at packetpushers.com> wrote:
>
>
> > In all my years of Systems Administration, things that run every
> > minute all the time usually end up being a "Bad Idea".
>
> > How will a smaller sampling period improve the service you provide?

It can be a bad idea sometimes, others not (for example, the reply from
the person catching intermittant problems with BB running every minute)

A smaller sampling period can show things in a more granular aspect. For
example, a process kicks off and 5 minutes later you see 100 errors (im
keeping things generic for illustrative purposes) Were those 100 errors in
the first minute? the last? constantly throughout the 5 minutes?

Im not saying your wrong, simply pointing out that it's not as black and
white as your making it.

> My job requires data be useful, not just interesting.  That is not to
> > say there aren't jobs were useful is good enough.

Something being just interesting initially can sometimes uncover problems
that
you didn't see before.

>> The graph DB's that vmstat feeds data into (the RRD files) are
> >> constructed in such a way that a 5-minute interval is what makes
> >> sense. So running them with anything else really just a waste of
> >> ressources.
>
> > With the stock larrd/hobbit RRD definitions you are correct.  He'll
> > only use one of the five, and whine about the timestamp of the other
> > four.

Firstly, can you explain your comment in more detail? Secondly,
im confused as to why you would state that I would "whine" about anything
when you have no basis for a conclusion to that effect. It seems to be a
rather
pointed comment in a discussion that hasn't involved the use of language
that
would dictate a response like that.

>. The design goal of larrd, (I can't speak for Henrik and hobbit/RRD)
> > was capacity planning and trending.  5m samples are  more than
> > adequate for that activity.
>
> > IMO, sampling at a high frequency implies real-time performance
> > analysis, and I've always felt that outside the scope of capacity
> > planning and trending.  EG. We don't run sendmail in debug all the
> > time . . . .
>
> > All that being said, those long term trends are very helpful for
> > problem resolution.  One can compare a single 5m sample against an
> > aggregate of 5m samples and determine if things are 'normal'.  But
> > the art of comparing all the activity within a single 5m sample for
> > normal is very very difficult.

That is a very good point you make. There is a difference between
real-time analysis and capacity planning/trending. I don't however think
that it is that far outside of hobbit's scope to try and leverage it for
a more pointed analysis. My goal isn't to take every machine in my
environment
and make them into 1 minute sampling period machines. To have the ability to
do
so on a machine-by-machine basis could be useful

> > That's my design you inherited and because of the complexity of the
> > parts, I think it is a very solid design.

I don't think anyone is really questioning that.

> To become flexible enough
> > to handle different sampling rates, the server would need to know the
> > frequency of the tests.  And then changing the RRD in the future is
> > 'almost' impossible (very difficult at the least).  And I've never
> > seen what happens to 1.5 years of data when you start messing with
> > the RRD.
>
> > In the end, I think you'd get the worst of both worlds.

Honestly, I don't claim to know anything about the way larrd and hobbit
are coded in the slightest. There are difficulties to be sure, but part of
having a
community such as this is to foster ideas and innovation. Just because you
don't think it's useful or that it's hard doesn't mean the same is true for
everyone out
there. What if you could add a high-frequency tag to a server and it
generates a seperate
high-frequncey graph for that, as well as updating the normal trend graph
for whatever
resource you wanted? That way you could choose for a day to look at a graph
for resource x every minute for a day then turn it off? There are lots of
ideas and I don't know if mine would even work, but you shouldn't just kill
the idea.

> I disagree.  If real-time performance analysis is needed, I would
> > pick other tools --  "vmstat 5"  works for me;)  Or construct/fork
> > the client agent specifically designed for such a task, and run it on
> > an as-needed basis.

There are other tools yes. I am trying to leverage hobbit. If it's not
possible
and nobody wants to do it, then yes, ill look into other tools. On the same
token
I don't want to kill my performance by running lots of different monitoring
on the server.
Hobbit is extrodinarily lightweight on the client (as opposed to other
solutions out there)
so I think something like this is possible without overloading a client.

Just my two cents.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20051214/c19d13ec/attachment.html>