[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [hobbit] Need help determining why alerts didn't come
- To: <hobbit (at) hswn.dk>
- Subject: RE: [hobbit] Need help determining why alerts didn't come
- From: "Hubbard, Greg L" <greg.hubbard (at) eds.com>
- Date: Fri, 7 Nov 2008 12:47:54 -0600
- References: <79FBAE1FB074294FB54941CEF4AA5D8102E75A8C (at) Verona.namerica.idexxi.com> <58EF0861D3A1A04182720B3A5231C7C2038FA952 (at) usplm205.amer.corp.eds.com> <79FBAE1FB074294FB54941CEF4AA5D8102E75BB2 (at) Verona.namerica.idexxi.com>
- Thread-index: AclA6HCq9GBZra/EQoirEjC1+j+VggAAotKwAAYKnsAAAXBA4A==
- Thread-topic: [hobbit] Need help determining why alerts didn't come
Couple of things to try:
a) make sure that your DEFAULT section is at the very bottom of the
hobbit-clients.cfg file.
Also, from the man page for the hobbit-client.cfg file:
RULES: APPLYING SETTINGS TO SELECTED HOSTS
Rules must be placed after the settings, e.g.
LOAD 8.0 12.0 HOST=db.foo.com TIME=*:0800:1600
If you have multiple settings that you want to apply the same rules to,
you can write the rules *only* on one line, followed by the settings.
E.g.
HOST=%db.*.foo.com TIME=W:0800:1600
LOAD 8.0 12.0
DISK /db 98 100
PROC mysqld 1
will apply the three settings to all of the "db" hosts on week-days
between 8AM and 4PM. This can be combined with per-settings rule, in
which case the per-settings rule overrides the general rule; e.g.
HOST=%.*.foo.com
LOAD 7.0 12.0 HOST=bax.foo.com
LOAD 3.0 8.0
will result in the load-limits being 7.0/12.0 for the "bax.foo.com"
host, and 3.0/8.0 for all other foo.com hosts.
The entire file is evaluated from the top to bottom, and the first match
found is used. So you should put the specific settings first, and the
generic ones last.
________________________________
From: Bouchard, Brian [mailto:Brian-Bouchard (at) idexx.com]
Sent: Friday, November 07, 2008 12:12 PM
To: hobbit (at) hswn.dk
Subject: RE: [hobbit] Need help determining why alerts didn't
come
Ok, I removed the hierarchy as suggested, Greg.
Then I added a line to my applesauce server so the
hobbit-clients.cfg now has the following:
HOST=applesauce
LOG /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow
PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES
PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01
PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01
PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01
DISK /wls_domains 40 97
Looking at the disk page for this server on hobbit, the page is
still green, and I see the following:
/dev/mapper/vg00-lvol10 9289080 5718512 3098712 65%
/wls_domains
When I run the config report for this server I see the following
for disk:
disk
No
-/-/-
Default limits: Yellow 90% full, Red 95% full
/wls_appl
/var
/boot
/wls_logs
/wls_domains
/opt
/usr
/root
/dev
/shm
/home
/tmp
I assume this is saying all of these disks are only going to go
yellow on 90% full., and red on 95% full? If this is the case, we
clearly have something set up incorrectly. If I am misunderstanding the
report, please let me know.
________________________________
From: Hubbard, Greg L [mailto:greg.hubbard (at) eds.com]
Sent: Friday, November 07, 2008 10:14 AM
To: hobbit (at) hswn.dk
Subject: RE: [hobbit] Need help determining why alerts didn't
come
You can always look at the page behind the "info" button for
applesauce to see how the alert rules were interpreted. You can also
run an event configuration report.
Personally, I would not try to be too clever in any of the
Hobbit configuration files unless the documentation provides a specific
example of "cleverness." I would explicitly list what I want for each
host, and not assume that I can set up a hierarchy of parameters using
multiple definitions. Over the past year or so there have been a number
of posts from people who are misled by their own assumptions that
"Hobbit works this way because I want/need it to work this way."
GLH
________________________________
From: Bouchard, Brian [mailto:Brian-Bouchard (at) idexx.com]
Sent: Friday, November 07, 2008 8:52 AM
To: hobbit (at) hswn.dk
Subject: [hobbit] Need help determining why alerts
didn't come
Hello Hobbit Gurus,
I am seeking help determining why we recently received
only some alerts that were configured on a given server.
In my hobbit-clients.cfg file I have multiple sections
of relevance:
#######################################################
# generic checks for all WebLogic Servers
#######################################################
HOST= applesauce,gravy,enchilada,chips
DISK * 95 97
PROC dsmcad 1 -1 yellow
FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST
red
#######################################################
# specific checks for applesauce
#######################################################
HOST=applesauce
LOG /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow
PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES
PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01
PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01
PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01
So, a couple of questions:
1) Is it valid to have different alerts for the
same HOST in the hobbit-clients.cfg like this? It seemed to work in
some instances, but I should ask before moving forward...
2) Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01" when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die. I cleaned up a bunch of log
files, and restarted the process and all was good... BUT... Why didn't
I receive the alert that the DISK was more than 97% full. I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down. Also, the check for the
"jrockit" file did not kick off when that file was create (after the
filesystem was at 100%) I need to determine why we weren't warned on
the disk space issue before our production application came down.
3) One other thing I noticed was that the IP
address for this server was incorrect in the bb-hosts file. I assume
that's an issue, but I'm not sure why we got some expected alerts and
not others. Also, I updated this entry in the bb-hosts file to the
correct IP, and cycled the hobbit server, but I am still not receiving
the alert on the jrockit file, which is still out there.
Any help is appreciated. I'm relatively new to Hobbit,
so its completely within the realm of possibility that I don't have any
of this set up correctly. Please feel free to correct me on anything
that looks out of whack.
- Brian