[hobbit] Need help determining why alerts didn't come

Bouchard, Brian Brian-Bouchard at idexx.com
Fri Nov 7 17:18:36 CET 2008


Tom - 

 

I don't have a bbsys.local file, but I did find a similar command in the
hobbitserver.cfg.  The setting there is as similar to as you suggested:

 

DF="/bin/df -Pk"

 

Also, to Greg's points earlier, I took the hierarchy out of the picture
for now.  I'm looking at the info and config reports now.

 

- Brian

________________________________

From: Tom Callahan [mailto:CallahanT at tessco.com] 
Sent: Friday, November 07, 2008 10:27 AM
To: hobbit at hswn.dk
Subject: Re: [hobbit] Need help determining why alerts didn't come

 

I've noticed inability to correctly parse "df" if you have long device
names (think device-mapper).

My solution was to change DF="df -k" in bbsys.local to DF="df -k -P" for
POSIX mode.

Try that and see if it helps?


On 11/7/08 9:52 AM, "Bouchard, Brian" <Brian-Bouchard at idexx.com> wrote:

Hello Hobbit Gurus,
 
I am seeking help determining why we recently received only some alerts
that were configured on a given server.
 
 
 
In my hobbit-clients.cfg file I have multiple sections of relevance:
 
#######################################################
# generic checks for all WebLogic Servers
#######################################################
HOST= applesauce,gravy,enchilada,chips
        DISK    *       95 97
        PROC dsmcad 1 -1 yellow
        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
#######################################################
# specific checks for applesauce
#######################################################
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01
 
 
So, a couple of questions:
 
1)       Is it valid to have different alerts for the same HOST in the
hobbit-clients.cfg like this?  It seemed to work in some instances, but
I should ask before moving forward...


2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01" when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.


3)       One other thing I noticed was that the IP address for this
server was incorrect in the bb-hosts file.  I assume that's an issue,
but I'm not sure why we got some expected alerts and not others.  Also,
I updated this entry in the bb-hosts file to the correct IP, and cycled
the hobbit server, but I am still not receiving the alert on the jrockit
file, which is still out there.
 
Any help is appreciated.  I'm relatively new to Hobbit, so its
completely within the realm of possibility that I don't have any of this
set up correctly. Please feel free to correct me on anything that looks
out of whack.
 
- Brian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20081107/cc704516/attachment.html>


More information about the Xymon mailing list