[hobbit] Logfile monitoring - I'd like some comments

Vernon Everett v.everett at afgonline.com.au
Wed Feb 15 06:59:22 CET 2006


Hi all

Henrik, I have given this a bit of thought, and think it's great.
(I refer to your proposal here, not my capacity for thought.)

Would it be possible to add custom strings and status?

A perfect example would be this. (from /var/adm/messages)
---snip---
Feb 10 13:31:15 afgdev tldd[649]: [ID 138416 daemon.error] TLD(0) drive
2 (device 1) is being DOWNED, status: Unable to open drive
---snip---
Anywhere else, this would not be a major issue, but on my backup server
where my tape library is attached, this is a major red alert.

Regards
    Vernon

 

-----Original Message-----
From: Henrik Stoerner [mailto:henrik at hswn.dk] 
Sent: Wednesday, 15 February 2006 5:40 AM
To: hobbit at hswn.dk
Subject: [hobbit] Logfile monitoring - I'd like some comments

A few days ago, I mentioned that I would like to do logfile monitoring
for the next Hobbit release.

I've worked a bit on this and have a prototype solution for it, which
you can test with the current snapshots. I'd like some comments on how
it works to make sure I haven't overlooked something before committing
myself.

There are several objectives:
- As far as is possible, logfile monitoring must be configured
  centrally, on the Hobbit server. Having to go to each server
  to (re)configure what logfiles to check and what to look for
  simply doesn't work.
- The amount of data sent from each client to Hobbit should be
  small, but it must catch the "important" stuff.
- You rarely know in advance what will be in the logs when you
  need them the most. So the monitor should give you as much
  of the log entries as possible, not just those lines that
  match some pre-defined strings or regex'es.
- Some systems log messages on multiple lines. The system must
  be able to show all parts of a log entry.
- Logfile entries must appear on the monitor for some time after
  they show up in the logs, but should also disappear after a
  while.

In other words: The ideal solution would let you have the entire logfile
available on the Hobbit server - but that obviously won't work. So the
client should - after weeding out the really irrelevant stuff - send us
as much of each logfile as possible.

My proposed solution is this:
- On the Hobbit server, there's a log-monitoring configuration
  file for the Hobbit clients. This defines which logfiles are
  monitored for a single client installation, or you can define
  it for a group of clients. (The idea is to define at least
  one group for each operating system, since the standard
  system logs are OS dependant). This configuration lists the
  log filename, the maximum amount of data to send from this
  logfile, a regex "noise" filter (i.e. lines that are stripped
  from the logfile), and *optionally* a regex identifying really
  interesting stuff in the logfile that should always be
  reported.
- When a client connects to the Hobbit server and sends the
  normal client message, the Hobbit server will respond with
  the logfile configuration for this client. So the client
  has a copy of the central configuration file, but only the
  part that it needs for itself. The reason for sending this
  as a response to the client message is to avoid an extra
  round-trip from client to server; piggy-backing the config
  push on the client message means that it is almost without
  any performance cost on the server side.
- When the client runs, it uses the local copy of the configuration
  file to determine what logs to look at. For each logfile, it
  maintains a "where-was-I-the-last-time" status, so it only
  looks at the entries made to the logfile during the past 30
  minutes. First, the client strips off any "noise" messages.
  Then, if all of the entries fit into the maximum size that
  can be reported, it sends all of the log to the Hobbit server.
  If there is more than will fit, it first checks to see of the
  regex defining the really interesting stuff is present in the
  log - if it is, then it drops anything before the interesting
  text. If there is still more than will fit, it keeps the
  interesting text + a few lines after that (to allow for
  multi-line log-entries which some OS'es have), and then
  sends that together with as much of last part the log as will
  fit inside the max. message size.

This part has been implemented in the Hobbit daemon (hobbitd), and in
the clients via a new "logfetch" utility. This utility uses standard
regular expressions - not the Perl-compatible ones, because that would
require you to install the PCRE library on all of your clients. The
standard regex routines are included in all (I think) system libraries
used today.

The last part is what happens when the log data arrives on the Hobbit
server. Currently, there's a simple processing of this data to just dump
it into an always-green "msgs" column. What should happen once I get it
coded is:
- Data from each logfile is matched against a set of strings
  (regex'es) defined in the hobbit-clients.cfg file. Each string
  determines the color (red, yellow, green) and sets the color
  of the msgs column accordingly.

When the color has been decided, all of the normal alerting happens
automatically. I do plan on making a more fine-grained alert mechanism
(for the msgs, procs and disk statuses) so you can direct alerts to
different groups depending on exactly which log-message triggered the
alert, but that will not be part of this release.


So - how does that sound ? Anything I've missed ?


Regards,
Henrik


To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk



_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

NOTICE: This message and any attachments are confidential and may contain copyright material 
of Australian Finance Group Limited or a third party. It is intended solely for the purpose of the 
addressee and any other named recipient. If you are not the intended recipient, any use, 
distribution, disclosure or copying of this message is strictly prohibited. The confidentiality attached
to this message is not waived or lost by reason of the mistaken transmission or delivery to any 
unintended party. If you have received this message in error, please notify the author immediately or 
contact Australian Finance Group on +61 8 9420 7888.




More information about the Xymon mailing list