[hobbit] Highlights of the 4.3.0 version

Gary Baluha gumby3203 at gmail.com
Mon Aug 6 15:28:31 CEST 2007


On 8/3/07, Haertig, David F (Dave) <haertig at avaya.com> wrote:
>
> Most everything I do in Hobbit is a custom script.  Restarting crashed
> processes is one of the least of my worries.  Although in some rare
> cases I do just that (short term), with appropriate logging and email to
> the app developement team.  The corporate expense of having the app down
> is too great to let Utopian ideas prevail.


Agreed, though sometimes it's worth the effort for an extra few minutes of
downtime to do *some* analysis.

Most of the automated Hobbit stuff I do is not restarting dead apps
> (luckily, that is very infrequent around here).  It's more mundane.  One
> example is disk space.  A full filesystem would shut many things down.
> Apps should not fill a filesystem, but sometimes they do.  So my custom
> Hobbit scripts first scream and scream about low disk space, even
> analysing things down to specific subdirectories and fast growing files
> and doing trend analysis.  But if their call is not answered, they start
> freeing up space from a "private reserve" I have set aside to deal with
> emergencies.  So if we experience a sudden unexpected blowup in a
> filesystem at 3am, Hobbit keeps things running in production until the
> appropriate people can look into and diagnose the problem.  This may not
> be Utopian behavior, but it sure is practical at 3am in the morning!


What sort of trend analysis do your scripts perform?  We have a few boxes
that are notorious for filling up their disk space, and I haven't yet come
up with an idea of how to neatly track exactly what it is that keeps filling
up the disk.

But my vote would be for Hobbit out-of-the-box to NOT attempt automated
> repair actions.  That should be left to the Hobbit administrator.  We
> can write custom monitor scripts or custom alert scripts to add this
> functionality if it's appropriate for our environments.  It's trivial to
> integrate your own scripting into Hobbit.


Due to the demands of some of the other admins, I have implemented a script
that does some rudimentary restarting, and even looks at the status of the
specific Hobbit alert in question, so that it doesn't try to restart
something, if the alert has been disabled (such as for a planned downtime).

It wasn't all that hard to write, and I also would prefer Hobbit NOT have
auto-restart logic out of the box.

I sure wish I worked in Utopia though.  The job would be a helluva lot
> less stressful!  :-)


Working in the real world isn't as bad, compared to working the real world
where management _thinks_ you actually work in Utopia, and yet still can't
spare an extra second of downtime for real-time root cause analysis. ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20070806/c10dd140/attachment.html>


More information about the Xymon mailing list