[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] hobbitd_rrd crash in trunk

To: hobbit (at) hswn.dk
Subject: Re: [hobbit] hobbitd_rrd crash in trunk
From: Olivier Beau <obeau79 (at) gmail.com>
Date: Sun, 25 Jan 2009 14:34:44 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=u+fsuJuqaG3PLKmVgyTVdLF71BRHIGJog05tv5Za4nA=; b=O1nFajW4l0GuhCEZ5ij/0fdres0f9Lj/qUHOFixDxn0TmKOGu1Yg6d5kq95GJBxkj0 rxj6D95pr8JE5YXWEAbn2aqm/c35+i8ynvrxirhoAIzRthFre+OHxfjM7iTU9h0qp11/ IzJauMR52vTr4r+DCwaYVNEtO/40UsPqMdclY=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=FMxwisCqM0ddluMThzCv4j3m+TNy82YELsXpQpRbTdPWqRbiOOK8yealdAe9O9BnRY raxdmzU7vrPeeIb12WLB9kxH5QzKZKnBo/+pMl+isUm2YHwz59AggXf3xTmLlxTWLJhA 0oGsuvj/6XXIDGjNb7eGEOjaV74eSBEVLi9GY=
References: <496C4C12.5000206 (at) gmail.com> <gkhqa1$ick$1 (at) voodoo.hswn.dk> <gkhqa1$ick$1 (at) voodoo.hswn.dk> <496C8738.4090604 (at) gmail.com> <gl1hjm$758$1 (at) voodoo.hswn.dk> <gl1hjm$758$1 (at) voodoo.hswn.dk> <4974AE8B.80706 (at) gmail.com> <gl9vsb$2gh$1 (at) voodoo.hswn.dk> <4978D6F8.9080204 (at) gmail.com>
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

i enabled all disabled status that are used from graphing
(cpu,disk,procs,...)
and i have not had a single crash in the last 36 hours
(before, crashes would happen at least twice per day)

-> from my user point of view, it looks like disabled status can crash
hobbitd_rrd


olivier


On 22/01/2009 21:28, Olivier Beau wrote:

Henrik,


Here are 2 other extracts from crashes :
2009-01-20 16:36:06 hobbitd_rrd: Got message 517875@@status#517875/sw01.courrierinternational|1232465766.838715|192.168.255.32||sw01.courrierinternational|if_load|1232467566|green||green|1225102669|0||0||0|0||network/switch-dedie
2009-01-20 16:36:06 startpos 162634, fillpos 166552, endpos -1
2009-01-20 16:36:06 Want msg 517876, startpos 162634, fillpos 166552,endpos -1, usedbytes=3918, bufleft=3618312009-01-20 16:36:06 Want msg 517876, startpos 162634, fillpos 170333,endpos -1, usedbytes=7699, bufleft=3580502009-01-20 16:36:06 hobbitd_rrd: Got message 517876@@status#517876/sw01.ctoutvert|1232465766.838761|192.168.255.32||sw01.ctout
vert|memory|1234247285|blue||blue|1231828085|0||1234247285|Disabled by
2009-01-20 16:36:06 startpos 172884, fillpos 172884, endpos -1
2009-01-20 16:36:06 Peer at 0.0.0.0:0 failed: Broken pipe
2009-01-20 16:36:06 Peer not up, flushing message queue
2009-01-20 16:36:06 Opening file/data/hobbit/server/etc/hobbit-rrddefinitions.cfg2009-01-20 16:36:06 Want msg 1, startpos 0, fillpos 0, endpos -1,usedbytes=0, bufleft=5283832009-01-20 16:36:06 hobbitd_rrd: Got message 517913@@status#517913/sw01.excenteurofac|1232465766.929692|192.168.255.32||sw01.excenteurofac|if_err|1232467566|green||green|1231866461|0||0||0|0||network/switch-dedie
if_load and if_err are status from devmon, that i do not graph usingncv/extra-test..
memory is also generate from devmon, and is graphes by default in xymon
2009-01-22 17:14:20 hobbitd_rrd: Got message 343666@@status#343666/logicimmo-netapp2|1232640859.848737|127.0.0.1||logicimmo-ne
tapp2|disk|2147483647|blue||blue|1232479545|0||-1|Disabled by
2009-01-22 17:14:20 startpos 417512, fillpos 419047, endpos -1
2009-01-22 17:14:20 Peer at 0.0.0.0:0 failed: Broken pipe
2009-01-22 17:14:20 Peer not up, flushing message queue
2009-01-22 17:14:20 Opening file/data/hobbit/server/etc/hobbit-rrddefinitions.cfg2009-01-22 17:14:20 Want msg 1, startpos 0, fillpos 0, endpos -1,usedbytes=0, bufleft=5283832009-01-22 17:14:20 hobbitd_rrd: Got message 343677@@status#343677/tif-netapp1|1232640860.884630|127.0.0.1||tif-netapp1|disk|1
232644460|green||green|1230710616|0||0||0|0|stockage|unix/infrasys/stockage
2009-01-22 17:14:20 startpos 1335, fillpos 3954, endpos 2589

disk is generate by netapp.pl (from the hobbit-client-perl)
-> i noticed that in my 3 extracts, the last log before the crash isdisabled. Looks like this could be a problem ?(i've check 2 other crashes, and there again, the last log is a disabledstatus)
i checked those 3 disabled status : those hosts are up and running (sonormal status are sent to hobbitd) we have disabled them for migrationpurpose, that might happen in a few days, or weeks...
For your mysql question :
yes i do graph mysql using NVC
NCV_mysql="Questions:DERIVE,Threadsconnected:GAUGE,*:NONE"
Olivier



On 22/01/2009 15:29, Henrik Størner wrote:
In <4974AE8B.80706 (at) gmail.com> Olivier Beau <obeau79 (at) gmail.com> writes:
It happened again today at 17:00:22.
Nothing new when doing a bt on the coredump.
An extract of rrd-status.log from 16h55 to 17h05 is available athttp://www.qalpit.com/~olivier/tmp/rrd-status.log.gz
OK, the interesting part is here when it crashes:
2009-01-19 17:00:22 hobbitd_rrd: Got message 181436@@status#181436/cedratnet-bdd1|1232380822.602633|127.0.0.1||cedratnet-bdd1|mysql|1232398822|green||green|1231215890|0||0||1232380812|0|linuxmysql|unix/mysql
2009-01-19 17:00:22 startpos 342639, fillpos 378880, endpos 342991
2009-01-19 17:00:22 hobbitd_rrd: Got message 181437@@status#181437/moniteur-ora2|1232380822.618847|10.12.0.67||moniteur-ora2|cpu|1255363113|blue||blue|1228751913|0||1255363113|Disabledby
2009-01-19 17:00:22 startpos 342995, fillpos 378880, endpos -1
2009-01-19 17:00:22 Peer at 0.0.0.0:0 failed: Broken pipe
2009-01-19 17:00:22 Peer not up, flushing message queue
2009-01-19 17:00:22 Opening file/data/hobbit/server/etc/hobbit-rrddefinitions.cfg2009-01-19 17:00:22 Want msg 1, startpos 0, fillpos 0, endpos -1,usedbytes=0, bufleft=5283832009-01-19 17:00:22 hobbitd_rrd: Got message 181450@@status#181450/nurun-etam-bdd1|1232380822.807004|127.0.0.1||nurun-etam-bdd1|mysql|1232398822|green||green|1231768476|0||0||1232380582|0|linuxmysql|unix/mysql
2009-01-19 17:00:22 startpos 17100, fillpos 19357, endpos 17846
2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/bb-hosts
It appears to be a "mysql" status from either cedratnet-bdd1 ornurun-etam-bdd1 that causes the crash (I cannot tell exactly, becauseoutput buffering comes
into play when there's a crash). It *could* also be the cpu-report from
moniteur-ora2, but I doubt that - the cpu-status is tested a lot morethan the mysql-status.
In fact, "mysql" isn't part of hobbitd_rrd by default. So is thissomethingyou've added ? Is it something that you generate graphs for ? Or is itjust
a status that hobbitd_rrd should ignore ?


Regards,
Henrik


To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe (at) hswn.dk

References:
- hobbitd_rrd crash in trunk
  - From: Olivier Beau
- Re: [hobbit] hobbitd_rrd crash in trunk
  - From: StÃ¸rner
- Re: [hobbit] hobbitd_rrd crash in trunk
  - From: Olivier Beau
- Re: [hobbit] hobbitd_rrd crash in trunk
  - From: StÃ¸rner
- Re: [hobbit] hobbitd_rrd crash in trunk
  - From: Olivier Beau
- Re: [hobbit] hobbitd_rrd crash in trunk
  - From: StÃ¸rner
- Re: [hobbit] hobbitd_rrd crash in trunk
  - From: Olivier Beau

Prev by Date: Re: [hobbit] Hobbit and Windows Server 2008
Next by Date: Temperature Graphs
Previous by thread: Re: [hobbit] hobbitd_rrd crash in trunk
Next by thread: Re: [hobbit] hobbitd_rrd crash in trunk
Index(es):
- Date
- Thread