[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Need help determining why alerts didn't come



I¹ve noticed inability to correctly parse ³df² if you have long device names
(think device-mapper).

My solution was to change DF=²df ­k² in bbsys.local to DF=²df ­k ­P² for
POSIX mode.

Try that and see if it helps?


On 11/7/08 9:52 AM, "Bouchard, Brian" <Brian-Bouchard (at) idexx.com> wrote:

> Hello Hobbit Gurus,
>  
> I am seeking help determining why we recently received only some alerts that
> were configured on a given server.
>  
>  
>  
> In my hobbit-clients.cfg file I have multiple sections of relevance:
>  
> #######################################################
> # generic checks for all WebLogic Servers
> #######################################################
> HOST= applesauce,gravy,enchilada,chips
>         DISK    *       95 97
>         PROC dsmcad 1 -1 yellow
>         FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
> #######################################################
> # specific checks for applesauce
> #######################################################
> HOST=applesauce
>        LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
>        PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
>        PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
>        PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
>        PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
>  
>  
> So, a couple of questions:
>  
> 1)       Is it valid to have different alerts for the same HOST in the
> hobbit-clients.cfg like this?  It seemed to work in some instances, but I
> should ask before moving forward?
> 
> 
> 2)       Yesterday, I received the alerts with TEXT=
> ³TOTAL_WEBLOGIC_PROCESSES² and ³PROD_ALSB_01² when I logged onto the server, I
> found the filesystem this process was running on was 100% used, which caused
> this process to die.  I cleaned up a bunch of log files, and restarted the
> process and all was good?  BUT? Why didn¹t I receive the alert that the DISK
> was more than 97% full.  I checked the history for the disk usage, and it had
> been over 95% for at least 6 hours prior to the process going down.  Also, the
> check for the ³jrockit² file did not kick off when that file was create
> (after the filesystem was at 100%)  I need to determine why we weren¹t warned
> on the disk space issue before our production application came down.
> 
> 
> 3)       One other thing I noticed was that the IP address for this server was
> incorrect in the bb-hosts file.  I assume that¹s an issue, but I¹m not sure
> why we got some expected alerts and not others.  Also, I updated this entry in
> the bb-hosts file to the correct IP, and cycled the hobbit server, but I am
> still not receiving the alert on the jrockit file, which is still out there.
>  
> Any help is appreciated.  I¹m relatively new to Hobbit, so its completely
> within the realm of possibility that I don¹t have any of this set up
> correctly. Please feel free to correct me on anything that looks out of whack.
>  
> - Brian
>