Also, if possible try to capture the offending disk report. Check the good report and the bad one to see if the reporting IP addresses are different. It is possible that two machines are reporting with the same hostname.
I've seen the 'Worker process died' message when I really screwed up something in the client coding. It likely means that something in the client message is out of place, which makes sense given the message you see about the disk report.
Rob Munsch wrote:Henrik,
I haven't been able to pinpoint a specific message at the same time the hobbitd_client dies. What i am seeing are blocks of things like this:
2007-02-26 09:56:52 Worker process died with exit code 134, terminating 2007-02-26 10:16:54 Worker process died with exit code 134, terminating 2007-02-26 10:16:55 Worker process died with exit code 134, terminating 2007-02-26 10:26:56 Worker process died with exit code 134, terminating 2007-02-26 10:26:56 Worker process died with exit code 134, terminating 2007-02-26 12:17:07 Worker process died with exit code 134, terminating 2007-02-26 12:17:11 Worker process died with exit code 134, terminating 2007-02-26 12:42:10 Worker process died with exit code 134, terminating 2007-02-26 12:42:14 Worker process died with exit code 134, terminating 2007-02-26 13:02:13 Worker process died with exit code 134, terminating 2007-02-26 13:02:17 Worker process died with exit code 134, terminating 2007-02-26 13:07:13 Worker process died with exit code 134, terminating 2007-02-26 13:07:18 Worker process died with exit code 134, terminating 2007-02-26 13:17:19 Worker process died with exit code 134, terminating 2007-02-26 13:22:20 Worker process died with exit code 134, terminating 2007-02-26 13:22:20 Worker process died with exit code 134, terminating 2007-02-26 13:27:20 Worker process died with exit code 134, terminating 2007-02-26 13:27:20 Worker process died with exit code 134, terminating 2007-02-26 13:32:21 Worker process died with exit code 134, terminating 2007-02-26 13:42:22 Worker process died with exit code 134, terminating 2007-02-26 13:42:22 Worker process died with exit code 134, terminating 2007-02-26 13:52:24 Worker process died with exit code 134, terminating 2007-02-26 13:52:24 Worker process died with exit code 134, terminating 2007-02-26 14:07:26 Worker process died with exit code 134, terminating 2007-02-26 14:07:26 Worker process died with exit code 134, terminating
I have it running in --debug mode as per your suggestion, and am getting a ton of output: i have a feeling it's a little more than i'm capable of sorting through well :(.
The only other oddity is it occasionally barfs on Disk tests. For no apparent reason i get
2007-02-26 09:31:49 Host grape (linux) sent incomprehensible disk report - missing columnheaders 'Capacity' and 'Mounted'
but by the next poll, it's figured it out again. i don't know if these are related, but it's all I've got right now.
I'll keep trying to correlate a specific message with the crash time and let you know what i find out.
Rob Munsch wrote:Rich Smrcina wrote:Go back a level (cd ..) and try it again. It happens to me alot! :)
Marvelously embarrassing. Thanks, proceeding with requested tests... sigh
Rob Munsch wrote:Henrik Stoerner wrote:On Thu, Feb 08, 2007 at 04:00:47PM -0500, Rob Munsch wrote:I still have a constantly red-then-purple hobbitd_client on my hobbit server.
It's gotten to the point where i have a cron job dropping the test continuously. I would appreciate any insight as to why this started happening and what is causing it.
Core was generated by `hobbitd_client'. Program terminated with signal 6, Aborted. #0 0xffffe410 in __kernel_vsyscall ()
Unfortunately this doesn't give a clue about what actually happened, except that it jumped to some wild address and crashed.
Could you add this line to hobbitd/hobbitd_client.c dbgprintf("Client report from host %s\n", (hostname ? hostname : "<unknown>"));
around line 1754, just after the
enum ostype_t os;
namelist_t *hinfo = NULL;
lines. Then run "make" to rebuild hobbitd_client, copy the
I tried doing this. The make bombed terribly; pages and pages of errors. It started like this:
-----
root (at) randomaccess ~/hobbit-4.2.0/hobbitd # make
cc -c -o hobbitd_client.o hobbitd_client.c
hobbitd_client.c:26:22: error: libbbgen.h: No such file or directory
In file included from hobbitd_client.c:28:
client_config.h:23: error: expected ')' before '*' token
client_config.h:27: error: expected ')' before '*' token
client_config.h:33: error: expected ')' before '*' token
client_config.h:38: error: expected ')' before '*' token
client_config.h:40: error: expected ')' before '*' token
client_config.h:43: error: expected ')' before '*' token
client_config.h:47: error: expected ')' before '*' token
client_config.h:51: error: expected ')' before '*' token
client_config.h:55: error: expected ')' before '*' token
hobbitd_client.c:46: error: 'COL_CLEAR' undeclared here (not in a function)
hobbitd_client.c:132: error: expected ')' before '*' token
hobbitd_client.c:165: error: expected declaration specifiers or '...' before 'namelist_t'
-----
I copied the line you gave me from this email, where specified, so i don't think it's that.
rob
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk
client doisneau.linux
[date]
Wed Feb 28 15:02:02 EST 2007
[uname]
Linux doisneau.office.solutionsforprogress.com 2.6.11 #1 SMP Wed Nov 16 14:07:49 EST 2005 i686 GNU/Linux
[uptime]
15:02:02 up 296 days, 19:03, 0 users, load average: 0.00, 0.00, 0.00
[who]
[df]
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 2006 ? 00:00:24 init [2]
root 2 1 0 2006 ? 00:00:00 [migration/0]
root 3 1 0 2006 ? 00:00:00 [ksoftirqd/0]
root 4 1 0 2006 ? 00:02:36 [events/0]
root 5 1 0 2006 ? 00:00:00 [khelper]
root 10 1 0 2006 ? 00:00:00 [kthread]
root 18 10 0 2006 ? 00:00:00 [kacpid]
root 63 10 0 2006 ? 00:03:32 [kblockd/0]
root 128 10 0 2006 ? 02:04:41 [pdflush]
root 131 10 0 2006 ? 00:00:00 [aio/0]
root 130 1 0 2006 ? 00:00:01 [kswapd0]
root 132 1 0 2006 ? 00:00:00 [cifsoplockd]
root 717 1 0 2006 ? 00:00:00 [kseriod]
root 782 10 0 2006 ? 00:00:00 [ata/0]
root 799 10 0 2006 ? 00:27:33 [reiserfs/0]
root 1122 1 0 2006 ? 02:30:05 /sbin/syslogd
root 1125 1 0 2006 ? 00:00:00 /sbin/klogd
mail 1148 1 0 2006 ? 00:00:00 /usr/lib/exim/exim3 -bd -q30m
uucp 1223 1 0 2006 ? 00:00:00 /usr/sbin/faxq
uucp 1225 1 0 2006 ? 00:00:06 /usr/sbin/hfaxd -i 4559
root 1240 1 0 2006 ? 00:00:00 /usr/sbin/inetd
daemon 1350 1 0 2006 ? 00:00:00 /usr/sbin/atd
root 1353 1 0 2006 ? 00:00:03 /usr/sbin/cron
root 1362 1 0 2006 tty2 00:00:00 /sbin/getty 38400 tty2
root 1363 1 0 2006 tty3 00:00:00 /sbin/getty 38400 tty3
root 1364 1 0 2006 tty4 00:00:00 /sbin/getty 38400 tty4
root 1365 1 0 2006 tty5 00:00:00 /sbin/getty 38400 tty5
root 1366 1 0 2006 tty6 00:00:00 /sbin/getty 38400 tty6
uucp 1367 1 0 2006 ? 00:01:46 /usr/sbin/faxgetty ttyS0
root 5268 1 0 2006 ? 00:00:28 /home/hobbit/client/bin/hobbitlaunch --config=/home/hobbit/client/etc/clientlaunch.cfg --log=/home/hobbit/client/logs/clientlaunch.log --pidfile=/home/hobbit/client/logs/clientlaunch.pid
root 19815 1 0 2006 ? 00:00:00 /usr/local/sbin/osirisd -r /usr/local/osiris
osiris 19816 19815 0 2006 ? 00:00:02 /usr/local/sbin/osirisd -r /usr/local/osiris
root 31765 1 0 2006 ? 00:00:23 /usr/sbin/ntpd -p /var/run/ntpd.pid
postgres 27103 1 0 2006 ? 00:00:13 /usr/lib/postgresql/bin/postmaster -D /var/lib/postgres/data
postgres 27109 27103 0 2006 ? 00:07:27 postgres: stats buffer process
postgres 27110 27109 0 2006 ? 00:07:24 postgres: stats collector process
root 29971 1 0 2006 tty1 00:00:00 /sbin/getty 38400 tty1
root 12487 10 0 2006 ? 00:35:10 [pdflush]
root 14714 1 0 2006 ? 00:00:19 /usr/sbin/sshd
root 19790 1 0 Jan06 ? 00:15:22 /usr/sbin/slapd -h ldap://127.0.0.1:389/ ldaps:/// ldapi:///
root 20416 1 0 Feb26 ? 00:00:00 ntpd
root 15365 1353 0 06:25 ? 00:00:00 /USR/SBIN/CRON
root 15366 15365 0 06:25 ? 00:00:00 /bin/sh -c test -x /usr/sbin/anacron || run-parts --report /etc/cron.daily
root 15367 15366 0 06:25 ? 00:00:00 run-parts --report /etc/cron.daily
root 15698 15367 0 06:25 ? 00:00:00 [jabber-restart] <defunct>
mail 15804 15365 0 06:25 ? 00:00:00 /usr/sbin/sendmail -i -FCronDaemon -oem root
jabber 15964 1 0 06:40 ? 00:00:02 perl -w -x /usr/local/jabberd2/bin/jabberd
jabber 15966 1 0 06:40 ? 00:00:01 /usr/local/jabberd2/bin/mu-conference -c /etc/jabberd/muc-conf.xml
jabber 15967 15964 0 06:40 ? 00:00:09 /usr/local/jabberd2/bin/router -c /usr/local/jabberd2/etc/jabberd/router.xml
jabber 15968 15964 0 06:40 ? 00:00:00 /usr/local/jabberd2/bin/resolver -c /usr/local/jabberd2/etc/jabberd/resolver.xml
jabber 15969 15964 0 06:40 ? 00:00:15 /usr/local/jabberd2/bin/sm -c /usr/local/jabberd2/etc/jabberd/sm.xml
jabber 15970 15964 0 06:40 ? 00:00:00 /usr/local/jabberd2/bin/s2s -c /usr/local/jabberd2/etc/jabberd/s2s.xml
postgres 15971 27103 0 06:40 ? 00:00:26 postgres: jabber jabberd2 127.0.0.1 idle
jabber 15972 15964 0 06:40 ? 00:00:13 /usr/local/jabberd2/bin/c2s -c /usr/local/jabberd2/etc/jabberd/c2s.xml
root 21205 1 0 14:17 ? 00:00:00 /home/hobbit/client/bin/hobbitlaunch --config=/home/hobbit/client/etc/clientlaunch.cfg --log=/home/hobbit/client/logs/clientlaunch.log --pidfile=/home/hobbit/client/logs/clientlaunch.pid
root 21706 21205 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient.sh
root 21707 21706 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient-linux.sh
root 21716 21707 0 15:02 ? 00:00:00 ps -efw
root 21719 5268 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient.sh
root 21720 21719 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient-linux.sh
root 21725 21720 0 15:02 ? 00:00:00 df -Pl -x none -x tmpfs -x shmfs -x unknown
root 21726 21720 0 15:02 ? 00:00:00 [sed]
[top]
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/hda3 979928 74584 905344 8% /
/dev/hda1 64220 41272 22948 65% /boot
/dev/mapper/root_vg-usr 2097084 771940 1325144 37% /usr
/dev/mapper/root_vg-home 23067964 70744 22997220 1% /home
/dev/mapper/root_vg-var 2097084 1582168 514916 76% /var
[meminfo]
[free]
total used free shared buffers cached
Mem: 1033484 991744 41740 0 191108 688472
-/+ buffers/cache: 112164 921320
Swap: 2000084 0 2000084
[netstat]
Ip:
85417010 total packets received
0 forwarded
0 incoming packets discarded
85416799 incoming packets delivered
86471394 requests sent out
Icmp:
158140 ICMP messages received
511 input ICMP message failed.
ICMP input histogram:
destination unreachable: 4914
timeout in transit: 204
source quenches: 1
redirects: 1
echo requests: 153018
echo replies: 1
153092 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 74
echo replies: 153018
Tcp:
182959 active connections openings
199005 passive connection openings
1 failed connection attempts
66529 connection resets received
65 connections established
71891630 segments received
72925768 segments send out
73548 segments retransmited
3 bad segments received.
493641 resets sent
Udp:
13366871 packets received
73 packets to unknown port received.
0 packet receive errors
13392531 packets sent
TcpExt:
50 resets received for embryonic SYN_RECV sockets
1341 packets pruned from receive queue because of socket buffer overrun
2257 ICMP packets dropped because they were out-of-window
124455 TCP sockets finished time wait in fast timer
41 time wait sockets recycled by time stamp
14 packets rejects in established connections because of timestamp
3115808 delayed acks sent
4714 delayed acks further delayed because of locked socket
Quick ack mode was activated 10841 times
2189784 packets directly queued to recvmsg prequeue.
352892 of bytes directly received from backlog
365882019 of bytes directly received from prequeue
12389655 packet headers predicted
1740831 packets header predicted and directly queued to user
8273781 acknowledgments not containing data received
14982537 predicted acknowledgments
14 times recovered from packet loss due to fast retransmit
261 times recovered from packet loss due to SACK data
TCPDSACKUndo: 10
1123 congestion windows recovered after partial ack
190 TCP data loss events
36 timeouts after reno fast retransmit
779 timeouts after SACK recovery
146 timeouts in loss state
361 fast retransmits
6 forward retransmits
406 retransmits in slow start
22631 other TCP timeouts
TCPRenoRecoveryFail: 3
67 sack retransmits failed
42 times receiver scheduled too late for direct processing
30743 packets collapsed in receive queue due to low socket buffer
10796 DSACKs sent for old packets
1 DSACKs sent for out of order packets
1170 DSACKs received
77733 connections reset due to unexpected data
1063 connections reset due to early user close
2843 connections aborted due to timeout
[ps]
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 2006 ? 00:00:24 init [2]
root 2 1 0 2006 ? 00:00:00 [migration/0]
root 3 1 0 2006 ? 00:00:00 [ksoftirqd/0]
root 4 1 0 2006 ? 00:02:36 [events/0]
root 5 1 0 2006 ? 00:00:00 [khelper]
root 10 1 0 2006 ? 00:00:00 [kthread]
root 18 10 0 2006 ? 00:00:00 [kacpid]
root 63 10 0 2006 ? 00:03:32 [kblockd/0]
root 128 10 0 2006 ? 02:04:41 [pdflush]
root 131 10 0 2006 ? 00:00:00 [aio/0]
root 130 1 0 2006 ? 00:00:01 [kswapd0]
root 132 1 0 2006 ? 00:00:00 [cifsoplockd]
root 717 1 0 2006 ? 00:00:00 [kseriod]
root 782 10 0 2006 ? 00:00:00 [ata/0]
root 799 10 0 2006 ? 00:27:33 [reiserfs/0]
root 1122 1 0 2006 ? 02:30:05 /sbin/syslogd
root 1125 1 0 2006 ? 00:00:00 /sbin/klogd
mail 1148 1 0 2006 ? 00:00:00 /usr/lib/exim/exim3 -bd -q30m
uucp 1223 1 0 2006 ? 00:00:00 /usr/sbin/faxq
uucp 1225 1 0 2006 ? 00:00:06 /usr/sbin/hfaxd -i 4559
root 1240 1 0 2006 ? 00:00:00 /usr/sbin/inetd
daemon 1350 1 0 2006 ? 00:00:00 /usr/sbin/atd
root 1353 1 0 2006 ? 00:00:03 /usr/sbin/cron
root 1362 1 0 2006 tty2 00:00:00 /sbin/getty 38400 tty2
root 1363 1 0 2006 tty3 00:00:00 /sbin/getty 38400 tty3
root 1364 1 0 2006 tty4 00:00:00 /sbin/getty 38400 tty4
root 1365 1 0 2006 tty5 00:00:00 /sbin/getty 38400 tty5
root 1366 1 0 2006 tty6 00:00:00 /sbin/getty 38400 tty6
uucp 1367 1 0 2006 ? 00:01:46 /usr/sbin/faxgetty ttyS0
root 5268 1 0 2006 ? 00:00:28 /home/hobbit/client/bin/hobbitlaunch --config=/home/hobbit/client/etc/clientlaunch.cfg --log=/home/hobbit/client/logs/clientlaunch.log --pidfile=/home/hobbit/client/logs/clientlaunch.pid
root 19815 1 0 2006 ? 00:00:00 /usr/local/sbin/osirisd -r /usr/local/osiris
osiris 19816 19815 0 2006 ? 00:00:02 /usr/local/sbin/osirisd -r /usr/local/osiris
root 31765 1 0 2006 ? 00:00:23 /usr/sbin/ntpd -p /var/run/ntpd.pid
postgres 27103 1 0 2006 ? 00:00:13 /usr/lib/postgresql/bin/postmaster -D /var/lib/postgres/data
postgres 27109 27103 0 2006 ? 00:07:27 postgres: stats buffer process
postgres 27110 27109 0 2006 ? 00:07:24 postgres: stats collector process
root 29971 1 0 2006 tty1 00:00:00 /sbin/getty 38400 tty1
root 12487 10 0 2006 ? 00:35:10 [pdflush]
root 14714 1 0 2006 ? 00:00:19 /usr/sbin/sshd
root 19790 1 0 Jan06 ? 00:15:22 /usr/sbin/slapd -h ldap://127.0.0.1:389/ ldaps:/// ldapi:///
root 20416 1 0 Feb26 ? 00:00:00 ntpd
root 15365 1353 0 06:25 ? 00:00:00 /USR/SBIN/CRON
root 15366 15365 0 06:25 ? 00:00:00 /bin/sh -c test -x /usr/sbin/anacron || run-parts --report /etc/cron.daily
root 15367 15366 0 06:25 ? 00:00:00 run-parts --report /etc/cron.daily
root 15698 15367 0 06:25 ? 00:00:00 [jabber-restart] <defunct>
mail 15804 15365 0 06:25 ? 00:00:00 /usr/sbin/sendmail -i -FCronDaemon -oem root
jabber 15964 1 0 06:40 ? 00:00:02 perl -w -x /usr/local/jabberd2/bin/jabberd
jabber 15966 1 0 06:40 ? 00:00:01 /usr/local/jabberd2/bin/mu-conference -c /etc/jabberd/muc-conf.xml
jabber 15967 15964 0 06:40 ? 00:00:09 /usr/local/jabberd2/bin/router -c /usr/local/jabberd2/etc/jabberd/router.xml
jabber 15968 15964 0 06:40 ? 00:00:00 /usr/local/jabberd2/bin/resolver -c /usr/local/jabberd2/etc/jabberd/resolver.xml
jabber 15969 15964 0 06:40 ? 00:00:15 /usr/local/jabberd2/bin/sm -c /usr/local/jabberd2/etc/jabberd/sm.xml
jabber 15970 15964 0 06:40 ? 00:00:00 /usr/local/jabberd2/bin/s2s -c /usr/local/jabberd2/etc/jabberd/s2s.xml
postgres 15971 27103 0 06:40 ? 00:00:26 postgres: jabber jabberd2 127.0.0.1 idle
jabber 15972 15964 0 06:40 ? 00:00:13 /usr/local/jabberd2/bin/c2s -c /usr/local/jabberd2/etc/jabberd/c2s.xml
root 21205 1 0 14:17 ? 00:00:00 /home/hobbit/client/bin/hobbitlaunch --config=/home/hobbit/client/etc/clientlaunch.cfg --log=/home/hobbit/client/logs/clientlaunch.log --pidfile=/home/hobbit/client/logs/clientlaunch.pid
root 21706 21205 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient.sh
root 21707 21706 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient-linux.sh
root 21719 5268 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient.sh
root 21720 21719 0 15:02 ? 00:00:00 /bin/sh /home/hobbit/client/bin/hobbitclient-linux.sh
root 21729 21707 0 15:02 ? 00:00:00 sh -c vmstat 300 2 1>/home/hobbit/client/tmp/hobbit_vmstat.21707 2>&1; mv /home/hobbit/client/tmp/hobbit_vmstat.21707 /home/hobbit/client/tmp/hobbit_vmstat
root 21730 21707 0 15:02 ? 00:00:00 sleep 5
root 21731 21729 0 15:02 ? 00:00:00 vmstat 300 2
root 21734 21720 0 15:02 ? 00:00:00 ps -efw
[top]
[vmstat]
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
3 0 0 41636 191104 688216 0 0 0 0 1 2 1 1 98 0
2 0 0 38220 191108 688472 0 0 0 7 1007 0 0 0 99 0