[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [hobbit] hobbitd_channel still crashing everyday
- To: hobbit (at) hswn.dk
- Subject: Re: [hobbit] hobbitd_channel still crashing everyday
- From: Henrik Stoerner <henrik (at) hswn.dk>
- Date: Thu, 25 Oct 2007 22:42:28 +0200
- References: <1d23acab0710231118u74cf98eer1df154dd303ff690 (at) mail.gmail.com> <20071023200234.GD16672 (at) hswn.dk> <BAY138-W11B4D91D708229B6FDDFF59F940 (at) phx.gbl> <961092e10710240715n33d04591geeb37fa388645090 (at) mail.gmail.com> <001d01c8164f$49af93d0$04011818 (at) rr.com> <20071025081623.GA26092 (at) hswn.dk> <001801c8171d$a0de1ae0$04011818 (at) rr.com>
- User-agent: Mutt/1.5.15+20070412 (2007-04-11)
On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:
> Ahh you are correct, my binary + source did not match
>
> Here is the stack trace from the (correct) binary (it's still crashing)
> ---- called from signal handler with signal 11 (SIGSEGV) ------
> [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"
Thanks, the line number isn't quite right, but I think this patch should
fix it. However, it should only happen if the worker process
(hobbitd_alert, hobbitd_rrd, hobbitd_history) cannot keep up with the
flow of incoming messages, so there might be a different problem with
your setup that triggers this. That would also explain why you see it
regularly, and others do not.
Anyway, let me know if this patch stops it from crashing.
Regards,
Henrik
--- hobbitd/hobbitd_channel.c 2007/09/11 21:20:54 1.60
+++ hobbitd/hobbitd_channel.c 2007/10/25 20:37:46
@@ -648,6 +648,8 @@
for (handle = rbtBegin(peers); (handle != rbtEnd(peers)); handle = rbtNext(peers, handle)) {
int canwrite = 1, hasfailed = 0;
hobbit_peer_t *pwalk;
+ time_t msgtimeout = now - MSGTIMEOUT;
+ int flushcount = 0;
pwalk = (hobbit_peer_t *) gettreeitem(peers, handle);
if (pwalk->msghead == NULL) continue; /* Ignore peers with nothing queued */
@@ -668,18 +670,14 @@
}
/* See if we have stale messages queued */
- if ((pwalk->msghead->tstamp + MSGTIMEOUT) < now) {
- /* Stale message at head of queue, flush all that are stale */
- time_t msgtimeout = now - MSGTIMEOUT;
- int count = 0;
-
- while (pwalk->msghead->tstamp < msgtimeout) {
- flushmessage(pwalk);
- count++;
- }
+ while (pwalk->msghead && (pwalk->msghead->tstamp < msgtimeout)) {
+ flushmessage(pwalk);
+ flushcount++;
+ }
+ if (flushcount) {
errprintf("Flushed %d stale messages for %s:%d\n",
- count,
+ flushcount,
inet_ntoa(pwalk->peeraddr.sin_addr),
ntohs(pwalk->peeraddr.sin_port));
}