[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [hobbit] Need help determining why alerts didn't come



You can always look at the page behind the "info" button for applesauce
to see how the alert rules were interpreted.  You can also run an event
configuration report.
 
Personally, I would not try to be too clever in any of the Hobbit
configuration files unless the documentation provides a specific example
of "cleverness."  I would explicitly list what I want for each host, and
not assume that I can set up a hierarchy of parameters using multiple
definitions.  Over the past year or so there have been a number of posts
from people who are misled by their own assumptions that "Hobbit works
this way because I want/need it to work this way."
 
GLH


________________________________

	From: Bouchard, Brian [mailto:Brian-Bouchard (at) idexx.com] 
	Sent: Friday, November 07, 2008 8:52 AM
	To: hobbit (at) hswn.dk
	Subject: [hobbit] Need help determining why alerts didn't come
	
	

	Hello Hobbit Gurus,

	 

	I am seeking help determining why we recently received only some
alerts that were configured on a given server.

	 

	 

	 

	In my hobbit-clients.cfg file I have multiple sections of
relevance:

	 

	#######################################################

	# generic checks for all WebLogic Servers

	#######################################################

	HOST= applesauce,gravy,enchilada,chips

	        DISK    *       95 97

	        PROC dsmcad 1 -1 yellow

	        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red

	#######################################################

	# specific checks for applesauce

	#######################################################

	HOST=applesauce

	       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow

	       PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES

	       PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01

	       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01

	       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01

	 

	 

	So, a couple of questions:

	 

	1)       Is it valid to have different alerts for the same HOST
in the hobbit-clients.cfg like this?  It seemed to work in some
instances, but I should ask before moving forward...
	
	

	2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01"  when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.
	
	

	3)       One other thing I noticed was that the IP address for
this server was incorrect in the bb-hosts file.  I assume that's an
issue, but I'm not sure why we got some expected alerts and not others.
Also, I updated this entry in the bb-hosts file to the correct IP, and
cycled the hobbit server, but I am still not receiving the alert on the
jrockit file, which is still out there.

	 

	Any help is appreciated.  I'm relatively new to Hobbit, so its
completely within the realm of possibility that I don't have any of this
set up correctly. Please feel free to correct me on anything that looks
out of whack.

	 

	- Brian