[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [hobbit] Need help determining why alerts didn't come



please look at "debugging' section of following url
 
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/Other_Docs/FAQ#Q._How_do_I_configure_.22GROUP.22_alerts_.3F
 
you can trace which alert rule got matched or not.
 
Hope this helps
T.J. Yang



Date: Fri, 7 Nov 2008 13:11:35 -0500From: Brian-Bouchard (at) idexx.comTo: hobbit (at) hswn.dkSubject: RE: [hobbit] Need help determining why alerts didn't come







Ok, I removed the hierarchy as suggested, Greg.
 
Then I added a line to my applesauce server so the hobbit-clients.cfg now has the following:
 
 
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
       DISK /wls_domains 40 97
 
 
Looking at the disk page for this server on hobbit, the page is still green, and I see the following:
 
/dev/mapper/vg00-lvol10   9289080 5718512   3098712  65% /wls_domains
 
When I run the config report for this server I see the following for disk:
 




disk

No

-/-/-

Default limits: Yellow 90% full, Red 95% full
/wls_appl
/var
/boot
/wls_logs
/wls_domains
/opt
/usr
/root
/dev
/shm
/home
/tmp
 
I assume this is saying all of these disks are only going to go yellow on 90% full., and red on 95% full?  If this is the case, we clearly have something set up incorrectly.  If I am misunderstanding the report, please let me know.
 
 




From: Hubbard, Greg L [mailto:greg.hubbard (at) eds.com] Sent: Friday, November 07, 2008 10:14 AMTo: hobbit (at) hswn.dkSubject: RE: [hobbit] Need help determining why alerts didn't come
 
You can always look at the page behind the "info" button for applesauce to see how the alert rules were interpreted.  You can also run an event configuration report.
 
Personally, I would not try to be too clever in any of the Hobbit configuration files unless the documentation provides a specific example of "cleverness."  I would explicitly list what I want for each host, and not assume that I can set up a hierarchy of parameters using multiple definitions.  Over the past year or so there have been a number of posts from people who are misled by their own assumptions that "Hobbit works this way because I want/need it to work this way."
 
GLH

 



From: Bouchard, Brian [mailto:Brian-Bouchard (at) idexx.com] Sent: Friday, November 07, 2008 8:52 AMTo: hobbit (at) hswn.dkSubject: [hobbit] Need help determining why alerts didn't come
Hello Hobbit Gurus,
 
I am seeking help determining why we recently received only some alerts that were configured on a given server.
 
 
 
In my hobbit-clients.cfg file I have multiple sections of relevance:
 
#######################################################
# generic checks for all WebLogic Servers
#######################################################
HOST= applesauce,gravy,enchilada,chips
        DISK    *       95 97
        PROC dsmcad 1 -1 yellow
        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
#######################################################
# specific checks for applesauce
#######################################################
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
 
 
So, a couple of questions:
 
1)       Is it valid to have different alerts for the same HOST in the hobbit-clients.cfg like this?  It seemed to work in some instances, but I should ask before moving forward…
2)       Yesterday, I received the alerts with TEXT=  “TOTAL_WEBLOGIC_PROCESSES” and “PROD_ALSB_01”  when I logged onto the server, I found the filesystem this process was running on was 100% used, which caused this process to die.  I cleaned up a bunch of log files, and restarted the process and all was good…  BUT… Why didn’t I receive the alert that the DISK was more than 97% full.  I checked the history for the disk usage, and it had been over 95% for at least 6 hours prior to the process going down.  Also, the check for the “jrockit” file did not kick off when that file was create  (after the filesystem was at 100%)  I need to determine why we weren’t warned on the disk space issue before our production application came down.
3)       One other thing I noticed was that the IP address for this server was incorrect in the bb-hosts file.  I assume that’s an issue, but I’m not sure why we got some expected alerts and not others.  Also, I updated this entry in the bb-hosts file to the correct IP, and cycled the hobbit server, but I am still not receiving the alert on the jrockit file, which is still out there.
 
Any help is appreciated.  I’m relatively new to Hobbit, so its completely within the realm of possibility that I don’t have any of this set up correctly. Please feel free to correct me on anything that looks out of whack.
 
- Brian
_________________________________________________________________
Stay up to date on your PC, the Web, and your mobile phone with Windows Live
http://clk.atdmt.com/MRT/go/119462413/direct/01/