[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Logfile monitoring - I'd like some comments



On Wed, Feb 15, 2006 at 09:23:22AM +0100, Thomas wrote:
> Hi Henrik,
> 
> So on the central server there will be 2 configuration files. One for 
> the log retrival defining interesting items ( I guess this is what today 
> is yellow and red strings) and then a hobbit-client configuration file 
> where you define the stings again ? I am not clear on why you would want 
> to seperate files with some of the same information in. 

No, you wouldn't define the same strings in both files - that would be
silly. You define the strings that can trigger a red or yellow status in
the hobbit-clients config - that's all. 

What you *can* put into the other config are some hints about how to minimize 
the amount of log data that Hobbit needs to process. So you can setup a
regexp of stuff in the logfile that you *never* want to see, and a
regexp of stuff that you *always* want to report - regardless of how
much the log grows. The last one may be identical to some of what you
have in the hobbit-clients config, but it could be different - or you
could go without any definition in the second file at all.

An example: you're monitoring an application that logs some data to a
logfile, and that you've set a limit on the amount of data you want
of 200 bytes (that is probably too small for anything, but just for
this example). You know that the application crashes occasionally,
but it usually recovers automatically - so you've just configured
the hobbit-clients.cfg file to send a warning for the application
"Startup complete" message, and an alert for "Startup failed" or 
"Error".

The log now looks like this:

  10:41:03 myapp: Startup complete
  10:41:03 myapp: -- MARK --
  10:44:03 myapp: -- MARK --
  10:47:03 myapp: -- MARK --
  10:48:32 myapp: Error reading data, retrying
  10:49:19 myapp: Error reading data, retrying
  10:49:20 myapp: Error reading data, retrying
  10:49:21 myapp: Error reading data, retrying
  10:49:22 myapp: Error reading data, retrying
  10:49:23 myapp: Error reading data, retrying
  10:49:24 myapp: Error reading data, retrying
  10:49:37 myapp: Unhandled exception at myapp_service.c:312: I/O error
  10:49:37 myapp: Instruction dump follows:
  0000000 030460 027060 070155 005147 060504 064556 060543 042012
  0000020 071545 072153 070157 042012 060551 071147 066541 027061
  0000040 064544 005141 067504 072543 062555 072156 005163 041105
  0000060 045512 027123 061145 065552 074545 072163 071157 005145
  0000100 052110 046115 052137 050157 043104 031455 031456 072056
  0000120 071141 063456 005172 060515 066151 046412 064541 062154
  0000140 071151 046412 075157 071141 057564 074563 063155 032137
  0000160 057460 064550 064147 066456 031560 046412 071565 065551
  0000200 047012 073545 005163 064520 071543 005062 051522 067171
  0000220 041143 061541 072553 005160 051523 005114 042530 064160
  0000240 066545 060412 061141 060412 071544 005154 063141 163154
  0000260 027163 074164 005164 067141 064564 064566 072562 005163
  0000300 071141 064143 073151 005145 072541 067564 060563 062566
  0000320 060412 064170 066557 005145 075141 071165 072545 005163
  0000340 033142 005064 033142 027064 005143 060542 065543 070165
  0000360 061012 071541 061551 057563 071550 067167 066056 064544
  0000400 005146 060542 064563 071543 071537 061165 062556 027164
  0000420 062154 063151 061012 026542 067550 072163 005163 061142
  10:49:37 myapp: Initiating recovery restart procedure
  10:49:38 myapp: Startup complete

The "-- MARK --" lines are just noise - they just tell os the 
application is running. So you put those into the "ignore" regexp 
that is pushed to the client, and the client will filter out those 
lines before reporting data to Hobbit.

Hobbit would normally report the last 200 bytes of the logfile.
But in this case, that would only include the dump data and the
"Startup complete" - so you would miss both the fact that the dump 
was due to an unhandled exception, and the fact that it may have 
been triggered by a disk error which causes the application to
retry I/O operations several times. And the "msgs" status would
be yellow.

To catch that, you can tell the Hobbit client to always include
certain log entries in the data it sends - e.g. here you could 
configure it to always include lines containing the word "Error".


> Will this new logfile retrival also be able to look for logfiles with 
> variable file names, ie. logfile.txt-20060215 for today and then a new 
> filename logfile.txt-20060216 tomorrow ? I know its stupid but that's 
> how the vendor creates it.

That's one variant I haven't seen yet. It would be tricky to implement;
couldn't you just run something like this on the client daily via cron:

   cd /var/log/myapp
   CURRENTLOG=`ls -t logfile.txt-* | head -1`
   ln -s $CURRENTLOG logfile.txt

and then Hobbit can look at logfile.txt ?


Regards,
Henrik