[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] hobbitd_alert crashes



Dominique Frise wrote:
Henrik Stoerner wrote:

On Fri, Jun 02, 2006 at 07:43:13AM +0200, Dominique Frise wrote:

(gdb) bt
#0  0xff1a05c8 in _libc_kill () from /usr/lib/libc.so.1
#1  0xff136d58 in abort () from /usr/lib/libc.so.1
#2  0x0002134c in sigsegv_handler (signum=0) at sig.c:57
#3  <signal handler called>



Hrm, that isn't much to go on. Does it crash right away when you start Hobbit, or only after some time has passed ?


It crashed 3 times last night. Hobbit was last restarted yesterday at 05:10 PM

If it crashes right away, I'd like a copy of your bb-hosts,
hobbitserver.cfg and hobbit-alerts.cfg files. If it crashes
after some time, could you add the "--debug" option to
the hobbitd_alert command in hobbitlaunch.cfg, and then mail
me the ~hobbit/server/logs/page.log file after it has crashed?

Done.
I'll mail you the log asap.

Thank you.

Dominique
UNIL - University of Lausanne

To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk


Looking at the event log, I noticed that the 3 times that hobbitd_alert crashed, it was trying to send to an IGNORE recipient (not always the same).
Here are our IGNORE rules after macros definitions at top of hobbit-alerts.cfg. Maybe there is something wrong with this configuration?


...
...
#---------------------------------------
# Hosts groups
#
$SAP_HOSTS=quartz,topaze,onyx,its,tulp,zircon
$ADMIN_HOSTS=bilbo,falco,furio

#------------------------------------------------------------------------------
# Rules to exclude alerting during a period of time
#------------------------------------------------------------------------------

HOST=* SERVICE=bckp TIME=*:2000:0700
   IGNORE
HOST=uldns1,uldns2 SERVICE=ldap TIME=*:0500:0530
   IGNORE
HOST=kawa,kawa2 SERVICE=http TIME=*:2210:2235
   IGNORE
HOST=gaia SERVICE=http TIME=*:0012:0015
   IGNORE
HOST=balrog,godzilla,smaug SERVICE=cpu TIME=*:2000:2359
   IGNORE
HOST=acsls,balrog,godzilla,smaug SERVICE=memory TIME=*:0600:0800
   IGNORE
HOST=unimedia,unimediad SERVICE=orcl,http TIME=*:0655:1115
   IGNORE
HOST=virtuavd SERVICE=orcl TIME=*:0001:0400
   IGNORE
HOST=tstvirtua SERVICE=orcl TIME=*:2159:0200
   IGNORE
HOST=ged SERVICE=http TIME=*:0305:0315
   IGNORE
HOST=$SAP_HOSTS SERVICE=conn,cpu,http,ftp TIME=*:1900:0700
   IGNORE
HOST=$SAP_HOSTS SERVICE=orcl,procs TIME=*:1900:2359
   IGNORE
HOST=$ADMIN_HOSTS SERVICE=http,sslcert TIME=*:0030:0630
   IGNORE
HOST=$ADMIN_HOSTS SERVICE=conn,cpu,http,sslcert TIME=*:1800:0700
   IGNORE
HOST=esope SERVICE=http,orcl TIME=*:0355:0600
   IGNORE
HOST=pcsan SERVICE=msgs,svcs,procs TIME=*:1955:2300
   IGNORE
HOST=iris SERVICE=hobbitd TIME=*:0310:0320
   IGNORE
HOST=lanfeust,winup TIME=*:1945:2200
   IGNORE
...
...


Dominique UNIL - University of Lausanne