[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] Need help determining why alerts didn't come
- To: <hobbit (at) hswn.dk>
- Subject: Re: [hobbit] Need help determining why alerts didn't come
- From: Tom Callahan <CallahanT (at) tessco.com>
- Date: Fri, 07 Nov 2008 10:26:36 -0500
- Thread-index: AclA6HCq9GBZra/EQoirEjC1+j+VggABMgRX
- Thread-topic: [hobbit] Need help determining why alerts didn't come
- User-agent: Microsoft-Entourage/12.10.0.080409
I¹ve noticed inability to correctly parse ³df² if you have long device names
(think device-mapper).
My solution was to change DF=²df k² in bbsys.local to DF=²df k P² for
POSIX mode.
Try that and see if it helps?
On 11/7/08 9:52 AM, "Bouchard, Brian" <Brian-Bouchard (at) idexx.com> wrote:
> Hello Hobbit Gurus,
>
> I am seeking help determining why we recently received only some alerts that
> were configured on a given server.
>
>
>
> In my hobbit-clients.cfg file I have multiple sections of relevance:
>
> #######################################################
> # generic checks for all WebLogic Servers
> #######################################################
> HOST= applesauce,gravy,enchilada,chips
> DISK * 95 97
> PROC dsmcad 1 -1 yellow
> FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
> #######################################################
> # specific checks for applesauce
> #######################################################
> HOST=applesauce
> LOG /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
> PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
> PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
> PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
> PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
>
>
> So, a couple of questions:
>
> 1) Is it valid to have different alerts for the same HOST in the
> hobbit-clients.cfg like this? It seemed to work in some instances, but I
> should ask before moving forward?
>
>
> 2) Yesterday, I received the alerts with TEXT=
> ³TOTAL_WEBLOGIC_PROCESSES² and ³PROD_ALSB_01² when I logged onto the server, I
> found the filesystem this process was running on was 100% used, which caused
> this process to die. I cleaned up a bunch of log files, and restarted the
> process and all was good? BUT? Why didn¹t I receive the alert that the DISK
> was more than 97% full. I checked the history for the disk usage, and it had
> been over 95% for at least 6 hours prior to the process going down. Also, the
> check for the ³jrockit² file did not kick off when that file was create
> (after the filesystem was at 100%) I need to determine why we weren¹t warned
> on the disk space issue before our production application came down.
>
>
> 3) One other thing I noticed was that the IP address for this server was
> incorrect in the bb-hosts file. I assume that¹s an issue, but I¹m not sure
> why we got some expected alerts and not others. Also, I updated this entry in
> the bb-hosts file to the correct IP, and cycled the hobbit server, but I am
> still not receiving the alert on the jrockit file, which is still out there.
>
> Any help is appreciated. I¹m relatively new to Hobbit, so its completely
> within the realm of possibility that I don¹t have any of this set up
> correctly. Please feel free to correct me on anything that looks out of whack.
>
> - Brian
>