[hobbit] Latest Snapshot

Henrik Stoerner henrik at hswn.dk
Thu Jan 26 21:45:46 CET 2006


On Thu, Jan 26, 2006 at 12:04:16PM -0500, Michael Dunne wrote:

>  I downloaded the latest snapshot and am curious if there is any
> documentation for the new "Critical Systems" enhancements. The new features
> look perfect for our organization and I'd love to start testing.

Docs are usually the last thing that gets done :-)

Fortunately, this is one of those things that doesn't need much 
documentation just to use.

There are two new CGI's - they both install automatically when
you run "make install". 

The "hobbit-nkview.sh" provides a view of the current critical 
systems, as defined in a new configuration file hobbit-nkview.cfg; 
the old "NK" tags in the bb-hosts file have been deprecated since 
they cannot hold all the info that is needed. So this is a 
replacement for the bbnk.html page that Hobbit inherited from
the bbgen tool. The new view is dynamic - so it's always up-to-date;
it allows you to prioritize systems so the most important ones
are listed first; it will automatically hide alerts that are
older than some threshold so a "noisy" system doesn't clutter the
view of the operators; and it allows the operator to acknowledge
alerts - both to get them off his display, and to inform others
that the alert has been recognized and some action has been taken.
The acknowledgement feature is going to be used a lot more in
the future; it is designed to allow acknowledgments to be sent
by different groups to allow for alerts that can escalate to
e.g. higher-level management regardless of any acks from a
technician.

In my production setup, the configuration of what systems and
alerts are "critical" is handled by a separate group who wants
some say about what can and cannot go on their monitor (e.g. 
they won't accept responsibility for monitoring stuff until the
operational procedures have been documented etc.); this is a
group of people who are very much "point-and-click" types. So
hand-editing text configuration files is out of the question.
Hence, a web-interface to configuring the critical systems
view was built; this is the "hobbit-nkedit.sh" CGI. So by using 
this tool, you can configure the "critical systems" view with
what status messages show up there; what their priority should be;
what times of the day/week they are monitored; any special
instructions the operators should see when there is an alert;
and for systems that are going into or out of service you can 
even define the date where this happens and the monitoring will
automatically start or stop at the right time. For groups of 
hosts with identical monitoring setups, you can define a template
for how they are configured on the "critical systems" view, and
then "clone" this template to all of the hosts that should use
the same template. So you only need to define the monitoring
once.

To use this, you basically build the current snapshot, then do
a "make install" on top of your current setup. You will need to
add a couple of menu items to your ~hobbit/server/www/menu/menu_items.js
file, to get the links to these two new CGI's; see the default file
that gets built in hobbitd/wwwfiles/menu/menu_items.js when you 
build Hobbit. In the "Views" menu, it is the "Critical systems"
item, and in the "Administration" menu it is the "Edit critical systems"
item. Note that the hobbit-nkedit.sh wrapper for the CGI with the
current snapshot gets installed in the public CGI directory; it 
should go in the secured (password-protected) CGI directory. This
has been fixed for the next snapshot.

Another thing you must do it to arrange for some way of allowing
the hobbit-nkedit CGI to update the hobbit-nkview.cfg file. The
way it is done, this CGI (which runs with your web-server userid)
must have access to update the ~hobbit/server/etc/ directory.  The 
easiest way of doing that is to make the program "suid hobbit". So: 

   chown hobbit ~hobbit/server/bin/hobbit-nkedit.cgi 
   chmod u+s ~hobbit/server/bin/hobbit-nkedit.cgi 

I haven't yet decided if that is the best way of doing it; I have
a bad feeling about making CGI programs suid, but this may be the
one occasion where it is needed. Feedback on this is welcome.


I'm actually doing a presentation tomorrow for our 24x7 monitoring
group, showing them how these new features work. It has been very
much designed based on input from them, so hopefully it should work
in a way that is useful to the operators. Once I get some confirmation
that there are no serious bugs in the new tools, I'll put it into
a proper release version - probably sometime in February.


Regards,
Henrik




More information about the Xymon mailing list