[hobbit] hobbitd_alert crashes
Dominique Frise
Dominique.Frise at unil.ch
Fri Jun 2 08:19:10 CEST 2006
Dominique Frise wrote:
> Henrik Stoerner wrote:
>
>> On Fri, Jun 02, 2006 at 07:43:13AM +0200, Dominique Frise wrote:
>>
>>> (gdb) bt
>>> #0 0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1
>>> #1 0xff136d58 in abort () from /usr/lib/libc.so.1
>>> #2 0x0002134c in sigsegv_handler (signum=0) at sig.c:57
>>> #3 <signal handler called>
>>
>>
>>
>> Hrm, that isn't much to go on. Does it crash right away when you start
>> Hobbit, or only after some time has passed ?
>>
> It crashed 3 times last night. Hobbit was last restarted yesterday at
> 05:10 PM
>
>> If it crashes right away, I'd like a copy of your bb-hosts,
>> hobbitserver.cfg and hobbit-alerts.cfg files. If it crashes
>> after some time, could you add the "--debug" option to
>> the hobbitd_alert command in hobbitlaunch.cfg, and then mail
>> me the ~hobbit/server/logs/page.log file after it has crashed?
>>
> Done.
> I'll mail you the log asap.
>
> Thank you.
>
> Dominique
> UNIL - University of Lausanne
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
Looking at the event log, I noticed that the 3 times that hobbitd_alert
crashed, it was trying to send to an IGNORE recipient (not always the same).
Here are our IGNORE rules after macros definitions at top of hobbit-alerts.cfg.
Maybe there is something wrong with this configuration?
...
...
#---------------------------------------
# Hosts groups
#
$SAP_HOSTS=quartz,topaze,onyx,its,tulp,zircon
$ADMIN_HOSTS=bilbo,falco,furio
#------------------------------------------------------------------------------
# Rules to exclude alerting during a period of time
#------------------------------------------------------------------------------
HOST=* SERVICE=bckp TIME=*:2000:0700
IGNORE
HOST=uldns1,uldns2 SERVICE=ldap TIME=*:0500:0530
IGNORE
HOST=kawa,kawa2 SERVICE=http TIME=*:2210:2235
IGNORE
HOST=gaia SERVICE=http TIME=*:0012:0015
IGNORE
HOST=balrog,godzilla,smaug SERVICE=cpu TIME=*:2000:2359
IGNORE
HOST=acsls,balrog,godzilla,smaug SERVICE=memory TIME=*:0600:0800
IGNORE
HOST=unimedia,unimediad SERVICE=orcl,http TIME=*:0655:1115
IGNORE
HOST=virtuavd SERVICE=orcl TIME=*:0001:0400
IGNORE
HOST=tstvirtua SERVICE=orcl TIME=*:2159:0200
IGNORE
HOST=ged SERVICE=http TIME=*:0305:0315
IGNORE
HOST=$SAP_HOSTS SERVICE=conn,cpu,http,ftp TIME=*:1900:0700
IGNORE
HOST=$SAP_HOSTS SERVICE=orcl,procs TIME=*:1900:2359
IGNORE
HOST=$ADMIN_HOSTS SERVICE=http,sslcert TIME=*:0030:0630
IGNORE
HOST=$ADMIN_HOSTS SERVICE=conn,cpu,http,sslcert TIME=*:1800:0700
IGNORE
HOST=esope SERVICE=http,orcl TIME=*:0355:0600
IGNORE
HOST=pcsan SERVICE=msgs,svcs,procs TIME=*:1955:2300
IGNORE
HOST=iris SERVICE=hobbitd TIME=*:0310:0320
IGNORE
HOST=lanfeust,winup TIME=*:1945:2200
IGNORE
...
...
Dominique
UNIL - University of Lausanne
More information about the Xymon
mailing list