[hobbit] RHEL5 and status-board not available bug?

Flyzone Micky flyzone at technologist.com
Mon Feb 16 12:35:51 CET 2009


On Thu, Feb 12, 2009 at 06:06:48PM +0000, Flyzone Micky wrote:
>"really low" as in ... how much ?

Output of iostat command:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.22    0.00    0.91    3.62    0.00   93.26

This is the output of iostat about nfs:
Device:              rBlk_nor/s   wBlk_nor/s   rBlk_dir/s  
vnetapp:/vol/hobbit     1631.11       373.97         0.00

wBlk_dir/s   rBlk_svr/s   wBlk_svr/s    rops/s    wops/s
      0.00      1170.83       825.22    840.76    840.76

In this last iostat have also a rsync statistic in it cause I was 
mantening a rsync on local disk of hobbit.

Unlucky nfsstat doesn't sho

>of all the RRD files - takes about 8 minutes. No chance at all
>then of keeping up with 5-minute update cycles.

But in this case will not appear a warning like this (that I don't have)?
WARNING: Runtime 110 longer than BBSLEEP

>I really think you should try shutting off the hobbitd_rrd tasks,
>just to see what happens.

Maybe I missed in the last post, but I have already done, and didn't 
solve the problem.

>For hosts to go purple they have to go more than 30 minutes without
>an update - they don't go purple just because they miss a single
>update.

Right...but doesn't appear always, I remember also an old patch 
that was in all-in-one about dirty-datas, but was already applied.

>I suppose you have check the kernel logs ('dmesg' output) for
>anything odd ?

Done, like all the logs in the system and hobbit. Nothing more 
message that could help.

>I'm wondering if maybe you're running out of ports (there's only
>64K of them, only about half can be used by normal apps). How
>many ports do you have in TIME_WAIT state ? 

Excluded, the port is 235-300 at maximun, and in the kernel parameter
I also tried to use (like in Oracle):
net.ipv4.ip_local_port_range = 1024 65000
but with or without nothing change.

>Another thing is the size of the ARP cache, if your hosts are
>all on the same IP network or your router/firewall is doing
>proxy-arp. 

The networks are about 4 differents.
And however, remember about my test on a just 20 clients.

>Is this server also running the network tests ?
> ...
>     sysctl net.ipv4.tcp_tw_reuse=1
>which enables the kernel to re-use ports that are in a TIME_WAIT

Yes, but like before...appear also with just a 20 clients,
so I would exclude a problem related at the numbers of clients.
However I tried also with:
net.ipv4.tcp_fin_timeout = 30
instead of the default 120 seconds in RHEL5 to leave a port 
in TIME_WAIT state.

>One (I) would expect the 64-bit systems to have a bit more "oomph"
>so they should be the ones that worked best.

Ahm...what is a oomph? :-S

>A datapoint here. I'm also running Hobbit on a 64-bit Linux 
>platform, but it is using SPARC (Sun) hardware. 

we are trying to shutdown all our sparc and pass to linux.. :)

>So you're saying that on a RHEL 5.3 64-bit Intel server, setting
>up Hobbit and feeding it with data from ~20 clients will make
>the system break?

Yes, this is the point RHEL > 5.0 and 64bit (AMD)...
I need yet to try on Fedora 10 64bit

>I think I would have heard about it before if this was a general
>problem.

Eh...I would like also to have heard it before :)))

However, shutting down hobbit, in the ipcs command yet show the
shared memory segment used with no process hobbit active, maybe 
something that hangs in hobbit?

Have a nice day

P.S: how could I reply using normal email client without create a
new thread to the ML?

-- 
Be Yourself @ mail.com!
Choose From 200+ Email Addresses
Get a Free Account at www.mail.com




More information about the Xymon mailing list