[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Alering on log file entries



In Hobbit 4.2, you can associate each rule in the hobbit-clients.cfg
file with a "group". E.g.

HOST=db1.foo.com
   DISK %^/oracle 95 98 GROUP=dba
   DISK / 90 95 GROUP=admins
   PROC sshd GROUP=admins
   PROC httpd GROUP=webmasters

When the client message is analyzed and the status messages are
generated, the group-names of any rules that result in a yellow 
or red status are combined into a group list, and the status message
it then "tagged" with this group-list.

So using the example above, if the /oracle/db1 filesystem is at 96%,
then the "disk" status is tagged with a "dba" group. If the root
filesystem is at 99%, then the "disk" status is tagged with an "admins"
group. If both happen, the "disk" status is tagged with a group-list
"admins,dba".

Likewise, if the "sshd" process is missing, the "procs" status is tagged
with the "admins" group; if there is not "httpd" process, then it is
tagged with the "webmasters" group.

These groups can then be referenced in the hobbit-alerts.cfg file.
E.g. if "john" takes care of the DB problems, "sue" is the webmaster,
and "bob" handles the normal admin problems, then hobbit-alerts.cfg 
might have this:

   HOST=db1.foo.com
      MAIL john (at) foo.com GROUP=dba
      MAIL bob (at) foo.com  GROUP=admins
      MAIL sue (at) foo.com  GROUP=webmasters
 
Or perhaps you'll just base the alerts on the groups, and have

   GROUP=dba
      MAIL john (at) foo.com
   GROUP=admins
      MAIL bob (at) foo.com
   GROUP=webmasters
      MAIL sue (at) foo.com


Note that this "group-thing" will NOT work with the old BB clients; you 
must use a real Hobbit client. But I gotta get you guys upgrading, so this 
is my cunning scheme to make all of you to stop using the BB client :-)

Also, currently this is only for client-side stuff - not for network
tests (eg. it might be relevant to direct "http" alerts to different
people, depending on which of the 5 URL's you check is down). That is
for a later release.

You can grab the current snapshot and play with it, but be warned that I
added this code yesterday and haven't had time to test it much - will do
that over the week-end while I have on-call duty (hopefully nothing will
happen).


Regards,
Henrik