[hobbit] hobbitlaunch segfault / timewarp happend again

Sebastian Auriol spa at syntec.co.uk
Thu Nov 13 20:30:12 CET 2008


Hi Henrik,

I suffered from the problem referred to the start of this thread (originally
reported at http://www.hswn.dk/hobbiton/2008/01/msg00570.html), except that
it applied not to hobbit-client hobbitlaunch but the server hobbitlaunch,
when the UK changed from BST to GMT on Oct 26 (core dump at 01:22). The
hobbit server is running k9linux that receives NTP broadcasts and the hour
changed back by 1 and then hobbit crashed and didn't come back via
hobbitlaunch. Someone actually drove over a hundred miles to reset the
server on the Sunday morning (although it could have been done remotely).

There was another core dump at the same time (01:22)...

# gdb /home/hobbit/server/bin/hobbitd /home/hobbit/server/core.17286
GNU gdb Red Hat Linux (6.3.0.0-1.96rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

Core was generated by `hobbitd --pidfile=/var/log/hobbit/hobbitd.pid
--restart=/usr/local/hobbit/serve'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libpcre.so.0...done.
Loaded symbols for /lib/libpcre.so.0
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libssl.so.4...done.
Loaded symbols for /lib/libssl.so.4
Reading symbols from /lib/libcrypto.so.4...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  errprintf (fmt=0x8064f8c "Time warp detected: Adjusting returned clock
by %d seconds\n") at errormsg.c:42
42              time_t now = getcurrenttime(NULL);
(gdb)
(gdb)
(gdb)
(gdb) bt
#0  errprintf (fmt=0x8064f8c "Time warp detected: Adjusting returned clock
by %d seconds\n") at errormsg.c:42
#1  0x0805ddf7 in getcurrenttime (retparm=0x0) at timefunc.c:73
#2  0x08058097 in errprintf (fmt=0x8064f8c "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#3  0x0805ddf7 in getcurrenttime (retparm=0x0) at timefunc.c:73
#4  0x08058097 in errprintf (fmt=0x8064f8c "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#5  0x0805ddf7 in getcurrenttime (retparm=0x0) at timefunc.c:73
#6  0x08058097 in errprintf (fmt=0x8064f8c "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#7  0x0805ddf7 in getcurrenttime (retparm=0x0) at timefunc.c:73
#8  0x08058097 in errprintf (fmt=0x8064f8c "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#9  0x0805ddf7 in getcurrenttime (retparm=0x0) at timefunc.c:73

Etc.  It continues repeating those two lines for a very long time in the
backtrace.

FYI, here is the backtrace for the core dump similar to the referenced
thread:

# gdb /home/hobbit/server/bin/hobbitlaunch /home/hobbit/server/core.12456
GNU gdb Red Hat Linux (6.3.0.0-1.96rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

Core was generated by `/home/hobbit/server/bin/hobbitlaunch
--config=/home/hobbit/server/etc/hobbitlau'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  errprintf (fmt=0x80558cc "Time warp detected: Adjusting returned clock
by %d seconds\n") at errormsg.c:42
42              time_t now = getcurrenttime(NULL);
(gdb)
(gdb)
(gdb) bt
#0  errprintf (fmt=0x80558cc "Time warp detected: Adjusting returned clock
by %d seconds\n") at errormsg.c:42
#1  0x0804e8d3 in getcurrenttime (retparm=0x0) at timefunc.c:73
#2  0x0804b833 in errprintf (fmt=0x80558cc "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#3  0x0804e8d3 in getcurrenttime (retparm=0x0) at timefunc.c:73
#4  0x0804b833 in errprintf (fmt=0x80558cc "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#5  0x0804e8d3 in getcurrenttime (retparm=0x0) at timefunc.c:73
#6  0x0804b833 in errprintf (fmt=0x80558cc "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#7  0x0804e8d3 in getcurrenttime (retparm=0x0) at timefunc.c:73
#8  0x0804b833 in errprintf (fmt=0x80558cc "Time warp detected: Adjusting
returned clock by %d seconds\n") at errormsg.c:42
#9  0x0804e8d3 in getcurrenttime (retparm=0x0) at timefunc.c:73

Etc.  It continues repeating those two lines for a very long time in the
backtrace.

Could you please check the patch that Darin Dugan made and sent to the list
on the 10th April 2008 (http://www.hswn.dk/hobbiton/2008/04/msg00136.html),
and after any changes (or fixes to your original patch) commit to svn trunk
and 4.2 branches and incorporate into 4.2.1?  I would like to have a Henrik
certified patch (TM) for this!  ;) Your original patch is at
http://www.hswn.dk/hobbiton/2008/01/msg00581.html for reference.  Also, my
core dumps are from a May 22nd snapshot of 4.3 so I suppose you never merged
your first patch - my diff suggests not.

Many thanks,

SebA


> -----Original Message-----
> From: Alexander Keller [mailto:hobbit at alexkeller.de] 
> Sent: 13 April 2008 19:52
> To: Dugan, Darin D [EIT]
> Subject: Re: [hobbit] hobbitlaunch segfault / timewarp happend again
> 
> Hi,
> 
> looks great. I applied your patch on a test system. So far it works
> perfect for me.
> 
> It would be great if Henrik could apply your patch.
> 
> Thanks!
>  Alexander
> 
> > I recently brought up a new client that has trouble keeping accurate
> > time...so I began encountering this time warp issue. As 
> pointed out by
> > Henrik in January, it is definitely an infinite loop where 
> errprintf()
> > calls getcurrenttime() to get its timestamp.
> 
> > The attached patch modifies the functions in errormsg.c to 
> use time()
> > instead of getcurrenttime(). That avoids any recursion-infinite loop
> > problems, and logs or prints errors with the system's actual time
> > instead of a Hobbit-adjusted-for-sanity time. In the 
> absence of accurate
> > time, I think it would be best to log in the system's time 
> so that you
> > can correlate Hobbit logs with other system logs.
> 
> > Working for me so far, but use at your own risk. Comments?
> 
> > Cheers.
> > Darin Dugan
> > dddugan at iastate.edu
> 
> > -----Original Message-----
> > From: Alexander Keller [mailto:hobbit at alexkeller.de] 
> > Sent: Friday, March 21, 2008 10:22 AM
> > To: hobbit at hswn.dk
> > Subject: Re: [hobbit] hobbitlaunch segfault / timewarp happend again
> 
> > Hi,
> 
> > unfortunately nobody answered to my posting, so I did a 
> quick'n dirty
> > hack to prevent timewarp segfaults in hobbitlaunch.
> 
> > Just comment out the errprintf-statement in lib/timefunc.c:
> 
> >   if (timewarphappened) {
> >   /*
> >    * Tell the world about it.
> >    * Must do this AFTER changing timewarp and lastresult,
> >    * or we will start an endless loop triggering a stack
> >    * overflow because errprintf() calls getcurrenttime().
> >    */
> >            /*
> >            * **** prevent segfault: do not log time warp. ****
> >            * errprintf("Time warp detected: Adjusting 
> returned clock by
> > %d seconds\n", timewarp);
> >            */
> >    }
> 
> > This is not a real solution, but it works for me. Maybe there is
> > somebody out, who can fix this issue properly  
> 
> > Regards
> >  Alexander
> 
> 
> >> Hi Henrik,
> 
> >> in january I reported a segfault with hobbitlaunch/timefunc.c. You
> > quickly
> >> provided a patch...
> 
> >> Now I'm having a new error - see core dump:
> 
> >> /opt/hobbit/client# gdb bin/hobbitlaunch core
> >> GNU gdb 6.4-debian
> >> Copyright 2005 Free Software Foundation, Inc.
> >> GDB is free software, covered by the GNU General Public 
> License, and
> > you are
> >> welcome to change it and/or distribute copies of it under certain
> > conditions.
> >> Type "show copying" to see the conditions.
> >> There is absolutely no warranty for GDB.  Type "show warranty" for
> > details.
> >> This GDB was configured as "i486-linux-gnu"...Using host 
> libthread_db
> >> library "/lib/tls/i686/cmov/libthread_db.so.1".
> 
> >> Core was generated by `./bin/hobbitlaunch
> > --config=./etc/clientlaunch.cfg
> >> --log=./logs/clientlaunch.lo'.
> >> Program terminated with signal 11, Segmentation fault.
> 
> >> warning: Can't read pathname for load map: Input/output error.
> >> Reading symbols from /usr/lib/libz.so.1...done.
> >> Loaded symbols for /usr/lib/libz.so.1
> >> Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
> >> Loaded symbols for /lib/tls/i686/cmov/libc.so.6
> >> Reading symbols from /lib/ld-linux.so.2...done.
> >> Loaded symbols for /lib/ld-linux.so.2
> >> #0  errprintf (fmt=0x6b <Address 0x6b out of bounds>) at 
> errormsg.c:42
> >> 42              time_t now = getcurrenttime(NULL);
> >> (gdb) bt
> >> #0  errprintf (fmt=0x6b <Address 0x6b out of bounds>) at 
> errormsg.c:42
> >> #1  0x0804f125 in getcurrenttime (retparm=0x0) at timefunc.c:73
> >> #2  0x0804b9e0 in errprintf (fmt=0x6b <Address 0x6b out of 
> bounds>) at
> >> errormsg.c:42
> >> #3  0x0804f125 in getcurrenttime (retparm=0x0) at timefunc.c:73
> >> #4  0x0804b9e0 in errprintf (fmt=0x6b <Address 0x6b out of 
> bounds>) at
> >> errormsg.c:42
> >> #5  0x0804f125 in getcurrenttime (retparm=0x0) at timefunc.c:73
> >> #6  0x0804b9e0 in errprintf (fmt=0x6b <Address 0x6b out of 
> bounds>) at
> >> errormsg.c:42
> >> #7  0x0804f125 in getcurrenttime (retparm=0x0) at timefunc.c:73
> >> #8  0x0804b9e0 in errprintf (fmt=0x6b <Address 0x6b out of 
> bounds>) at
> >> errormsg.c:42
> >> #9  0x0804f125 in getcurrenttime (retparm=0x0) at timefunc.c:73
> >> #10 0x0804b9e0 in errprintf (fmt=0x6b <Address 0x6b out of 
> bounds>) at
> >> errormsg.c:42
> >> [...]
> 
> >> I can reproduce the error with "ntpdate" using a misconfigured ntp
> > server
> >> (2 min in the past):
> 
> >> 1. start hobbit client "runclient.sh start"
> >> 2. sync time with "ntpdate <misconfigured-time-server>"
> >> 3. get a core dump  
> 
> 
> >> Regards,
> >>  Alexander
> 
> 
> 
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
> 
> 




More information about the Xymon mailing list