[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Hobbit 4.0.4 released - Alert Script Issue



Hi Henrik,

Today with the alerttrace still on and, yes, yesterday the script was
executed correctly in a tiny test-config. The original config still
gives me problems. I checked for control characters in the
hobbit-alerts.cfg-file (vi -> set list), and nothing weird found.

Part of the hobbit-alerts.cfg

-some macro's:

### Enabled now and then for testing purposes.
###$UNIXTEST=MAIL me (at) somedomain.nl DURATION>6m TIME=W:0800:1730
REPEAT=1d RECOVERED COLOR=yellow,red,purple

$UNIXDAG=MAIL somewhere (at) somedomain.nl DURATION>6m TIME=W:0800:1730
REPEAT=1d RECOVERED

$UNIXNACHT=MAIL somewhere (at) somedomain.nl TIME=*:0000:2359 DURATION>30m
REPEAT=1d SERVICE=!cpu,!msgs RECOVERED COLOR=!yellow

$UNIXSEMAFOON_BEHEER=SCRIPT /usr/local/bb/consigne.ksh 00765327285
FORMAT=SMS TIME=*:0000:2359 DURATION>30m REPEAT=60m
SERVICE=!cpu,!msgs,!smtp,!bbgen,!bbtest,!hobbitd COLOR=!yellow

-A host not responding for $UNIXSEMAFOON_BEHEER while the yellow mail
$UNIXDAG has been sent:

HOST=%(orwell)
        $UNIXDAG
        $UNIXTEST
        $UNIXNACHT
        $UNIXSEMAFOON_BEHEER

The host does give me an email for a threshold exceeded (disk>95%) and
that can be seen in the trace (I only grepped the host specific
entries):

00013241 2005-08-18 10:04:45 *** Match with 'HOST=%(orwell)' ***
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 191
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 193
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 194
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 196
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 203
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 209
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 216
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 223
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 229
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 236
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 242
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 254
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 261
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 268
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 275
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 282
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 287
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 294
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 300
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 304
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 311
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 322
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 332
00013241 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 340
00013241 2005-08-18 10:04:45 Failed 'HOST=%(orwell)' (hostname not in
include list)
00015024 2005-08-18 10:04:45 send_alert orwell:disk state Paging
00015024 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 184
00015024 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 190
00015024 2005-08-18 10:04:45 *** Match with 'HOST=%(orwell)' ***
00015024 2005-08-18 10:04:45 Matching host:service:page
'orwell:disk:DNO/SBEHEER' against rule line 191
00015024 2005-08-18 10:04:45 Mail alert with command 'mail -s "Hobbit
[25437] orwell:disk CRITICAL (RED)" central (at) somedomain.nl'

But the next (expected) step can not be seen in the trace and it does not occur.

All this could be just a configuration issue, so I restored another
tiny config and restarted Hobbit, and that worked fine. So no problems
with the mail or script etc  :-]

So, now I did the following:
-I restored the hobbit-alert.cfg we must use.
-I uncommented my $UNIXTEST-macro to prevent empty lines in
HOST-sections in the hobbit-alert.cfg knowing that Hobbit can have
problems with 2 or more spaces (perhaps newlines too?)
-moved the $UNIXTEST-macro to the end of each HOST-section for times I
comment out the previous line ;-)
-Restarted Hobbit.
-Now the first alert is being sent as it should, but the one alert
that should page after 30 minutes fails and nothing that triggers
something in the logfile.

Regards,

Peter