[hobbit] msgcache is unstable
Rolf Masfelder
rolf.masfelder at nector.de
Tue Aug 22 18:10:08 CEST 2006
Am Dienstag 22 August 2006 16:21 schrieb Henrik Stoerner:
> On Tue, Aug 22, 2006 at 04:03:20PM +0200, Rolf Masfelder wrote:
> > i'm running msgcache on a remote machine. For some houres it works
> > fine, but at some point it stopps working.
> >
> > Here is a part of msgcache.log (msgcache is running with --debug)
> >
> >
> > 2006-08-22 15:12:13 New connection
> > 2006-08-22 15:13:36 New connection
> > 2006-08-22 15:13:37 -> oksender
> > 2006-08-22 15:13:37 <- oksender(1-a)
> > 2006-08-22 15:13:37 Got pullclient request: pullclient 1
> > 2006-08-22 15:13:43 New connection
> > 2006-08-22 15:13:53 New connection
> > 2006-08-22 15:16:13 New connection
> > 2006-08-22 15:18:44 New connection
here are some lines from hobbitclient.log:
2006-08-22 00:56:18 Whoops ! bb failed to send message - timeout
2006-08-22 01:01:19 Whoops ! bb failed to send message - timeout
(here i restarted hobbit on the client)
2006-08-22 15:13:58 Whoops ! bb failed to send message - timeout
2006-08-22 15:18:59 Whoops ! bb failed to send message - timeout
2006-08-22 15:23:59 Whoops ! bb failed to send message - timeout
2006-08-22 15:29:00 Whoops ! bb failed to send message - timeout
2006-08-22 15:34:00 Whoops ! bb failed to send message - timeout
2006-08-22 15:39:01 Whoops ! bb failed to send message - timeout
2006-08-22 15:44:02 Whoops ! bb failed to send message - timeout
>
> It would be interesting to see the Hobbit server's hobbitfetch.log
> file also around this time. I suspect there might be some "Timeout
> while talking to ..." messages there.
here are the last lines of hobbitfetch.log:
2006-08-22 00:45:54 Timeout while talking to 212.227.90.152:1984 (req 7545): Aborting session
2006-08-22 01:01:44 Connection lost during read from 212.227.90.152:1984 (req 7552): Connection reset by peer
2006-08-22 15:12:46 Connection lost during read from 212.227.90.152:1984 (req 8642): Connection reset by peer
^^^^^^^^^^^^^^ the client with msgcache
2006-08-22 15:29:23 Timeout while talking to 212.227.90.152:1984 (req 8649): Aborting session
2006-08-22 15:44:38 Timeout while talking to 212.227.90.152:1984 (req 8655): Aborting session
2006-08-22 16:00:54 Timeout while talking to 212.227.90.152:1984 (req 8661): Aborting session
I use a connection without a tunnel, so you may access the messagecache also :-<
>
> Also, try enabling debugging for hobbitfetch with
> "kill -USR2 <hobbitfetch PID>"
> and then force a listing of the currently active connections with
> "kill -USR1 <hobbitfetch PID>"
>
looking for the id i found:
hobbit 27013 27011 0 Aug15 ? 00:00:02 /home/hobbit/server/bin/hobbitfetch --pidfile=/var/log/hobbit/hobbitfetch.pid
but in /var/log/hobbit there is no hobbitfetch.pid ???
ok ...
kill -USR2 27013
kill -USR1 27013
here are the lines from hobbitfetch.log:
2006-08-22 00:45:54 Timeout while talking to 212.227.90.152:1984 (req 7545): Aborting session
2006-08-22 01:01:44 Connection lost during read from 212.227.90.152:1984 (req 7552): Connection reset by peer
2006-08-22 15:12:46 Connection lost during read from 212.227.90.152:1984 (req 8642): Connection reset by peer
2006-08-22 15:29:23 Timeout while talking to 212.227.90.152:1984 (req 8649): Aborting session
2006-08-22 15:44:38 Timeout while talking to 212.227.90.152:1984 (req 8655): Aborting session
2006-08-22 16:00:54 Timeout while talking to 212.227.90.152:1984 (req 8661): Aborting session
2006-08-22 18:00:49 Debug ON
2006-08-22 18:00:53 Queuing request 8810 to 212.227.90.152:1984 for p15191085: 'pullclient 1
log:/var/log/messages:10240
ignore MARK
'
2006-08-22 18:00:53 Sent 54 bytes to 212.227.90.152:1984 (req 8810)
2006-08-22 18:00:53 Done reading data from 212.227.90.152:1984 (req 8810)
2006-08-22 18:00:53 Doing cleanup
2006-08-22 18:00:53 Next poll of p15191085 in 45 seconds
2006-08-22 18:00:53 Request completed: req 8810, peer 212.227.90.152:1984, action was 2, type was 0
2006-08-22 18:01:38 Queuing request 8811 to 212.227.90.152:1984 for p15191085: 'pullclient 1
log:/var/log/messages:10240
ignore MARK
'
2006-08-22 18:01:38 Sent 54 bytes to 212.227.90.152:1984 (req 8811)
2006-08-22 18:01:38 Done reading data from 212.227.90.152:1984 (req 8811)
2006-08-22 18:01:38 Doing cleanup
2006-08-22 18:01:38 Next poll of p15191085 in 44 seconds (for client msg)
2006-08-22 18:01:38 Request completed: req 8811, peer 212.227.90.152:1984, action was 2, type was 0
>
> Finally, when msgcache is in this state, what happens if you run
I have to wait for this ...
> - from the Hobbit server - the command
>
> bb IP.OF.CLIENT.HOST "pullclient"
>
> It should dump the last status message to the screen.
>
>
> Regards,
> Henrik
>
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
Thanks in advance
--
Rolf Masfelder
Tel.: 06321 355 207
FAX: 06321 355 224
Mobil: 0160 80 64 181
world: 0700 NECTORGMbh
EMail: rolf.masfelder at nector.de
http://www.nector.de
More information about the Xymon
mailing list