[hobbit] RHEL5 and status-board not available bug?
Flyzone Micky
flyzone at technologist.com
Mon Feb 16 12:35:51 CET 2009
On Thu, Feb 12, 2009 at 06:06:48PM +0000, Flyzone Micky wrote:
>"really low" as in ... how much ?
Output of iostat command:
avg-cpu: %user %nice %system %iowait %steal %idle
2.22 0.00 0.91 3.62 0.00 93.26
This is the output of iostat about nfs:
Device: rBlk_nor/s wBlk_nor/s rBlk_dir/s
vnetapp:/vol/hobbit 1631.11 373.97 0.00
wBlk_dir/s rBlk_svr/s wBlk_svr/s rops/s wops/s
0.00 1170.83 825.22 840.76 840.76
In this last iostat have also a rsync statistic in it cause I was
mantening a rsync on local disk of hobbit.
Unlucky nfsstat doesn't sho
>of all the RRD files - takes about 8 minutes. No chance at all
>then of keeping up with 5-minute update cycles.
But in this case will not appear a warning like this (that I don't have)?
WARNING: Runtime 110 longer than BBSLEEP
>I really think you should try shutting off the hobbitd_rrd tasks,
>just to see what happens.
Maybe I missed in the last post, but I have already done, and didn't
solve the problem.
>For hosts to go purple they have to go more than 30 minutes without
>an update - they don't go purple just because they miss a single
>update.
Right...but doesn't appear always, I remember also an old patch
that was in all-in-one about dirty-datas, but was already applied.
>I suppose you have check the kernel logs ('dmesg' output) for
>anything odd ?
Done, like all the logs in the system and hobbit. Nothing more
message that could help.
>I'm wondering if maybe you're running out of ports (there's only
>64K of them, only about half can be used by normal apps). How
>many ports do you have in TIME_WAIT state ?
Excluded, the port is 235-300 at maximun, and in the kernel parameter
I also tried to use (like in Oracle):
net.ipv4.ip_local_port_range = 1024 65000
but with or without nothing change.
>Another thing is the size of the ARP cache, if your hosts are
>all on the same IP network or your router/firewall is doing
>proxy-arp.
The networks are about 4 differents.
And however, remember about my test on a just 20 clients.
>Is this server also running the network tests ?
> ...
> sysctl net.ipv4.tcp_tw_reuse=1
>which enables the kernel to re-use ports that are in a TIME_WAIT
Yes, but like before...appear also with just a 20 clients,
so I would exclude a problem related at the numbers of clients.
However I tried also with:
net.ipv4.tcp_fin_timeout = 30
instead of the default 120 seconds in RHEL5 to leave a port
in TIME_WAIT state.
>One (I) would expect the 64-bit systems to have a bit more "oomph"
>so they should be the ones that worked best.
Ahm...what is a oomph? :-S
>A datapoint here. I'm also running Hobbit on a 64-bit Linux
>platform, but it is using SPARC (Sun) hardware.
we are trying to shutdown all our sparc and pass to linux.. :)
>So you're saying that on a RHEL 5.3 64-bit Intel server, setting
>up Hobbit and feeding it with data from ~20 clients will make
>the system break?
Yes, this is the point RHEL > 5.0 and 64bit (AMD)...
I need yet to try on Fedora 10 64bit
>I think I would have heard about it before if this was a general
>problem.
Eh...I would like also to have heard it before :)))
However, shutting down hobbit, in the ipcs command yet show the
shared memory segment used with no process hobbit active, maybe
something that hangs in hobbit?
Have a nice day
P.S: how could I reply using normal email client without create a
new thread to the ML?
--
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com
More information about the Xymon
mailing list