[hobbit] Thoughts

Wed May 2 23:24:20 CEST 2007

Actually, you just indirectly mentioned that feels like a fairly elegant solution.  What would be nice in this particular case would be to be able to attach a service label to the PROCS tests for groups of processes.  The service could then be monitored without custom tests being created for each one.  New colums can be created from the service tag without really cluttering the lines.

I'll have to think about how the log files are processed to see if something like that works or not.

Jason

________________________________

From: Dan Vande More [mailto:bigdan at gmail.com]
Sent: Wed 5/2/2007 4:09 PM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Thoughts

Indeed, it seems to me that the whole group concept is a good way to work with us humans but breaks down wildly when dealing with computers. This is fine because most of us use the groups to save space on the screens, and configuration in the conf files. 

If you want tests for each process and ultimately different behaviours for each process, you need to be prepared to do the work and make the tests for each process.

Please don't overcomplicate hobbit for this - it's a corner case and will ultimately make the program more unwieldy. 

On 5/2/07, Henrik Stoerner <henrik at hswn.dk> wrote: 

	On Wed, May 02, 2007 at 02:06:34PM -0500, Kruse, Jason K. wrote:
	> Grouped items, such as the process check and log monitors, are issues.
	> A single process down causes the whole check to go red.  A process
	> listed as alerting only operators can then mask another process on the
	> same system from notifying the DBA's.  Setting the alert repeat interval
	> to 0 shows the other problem, a recovery message is not generated for 
	> each process that recovers, only when the whole group of processes
	> recovers.

	This will be difficult to handle - it's a very basic thing in the Hobbit
	design that it only tracks the color of each status, not the details of 
	which rule (out of many) causes e.g. the "procs" column to go red.

	To do that, you would need to associate some "event ID" with each of the
	settings that can cause a red/yellow status; e.g . you'd have

	   HOST=myhost
	       PROC tnslistener 1 ID=100
	       PROC httpd 4 ID=200

	The "procs" status would then store the set of ID's that had been triggered
	for a status, and whenever there was a change in the set of triggered 
	rules it would pass this information to some process.

	It can be done, but I am not particularly happy with it; it seems a bit too
	complex for my taste. If anyone has a better idea, please speak up.

	(And just in case you wonder why I've used a new "event ID" instead of 
	re-using the existing "group" definition: I can easily imagine a
	scenario where you have e.g. multiple processes monitored with alerts
	going to one group of people (i.e. several PROC rules have the same 
	GROUP setting), but you still want to track exactly which processes are
	up or down - and then you need a unique ID for each PROC rule).

	Regards,
	Henrik

	To unsubscribe from the hobbit list, send an e-mail to 
	hobbit-unsubscribe at hswn.dk