performance help needed
shea_greg at emc.com
shea_greg at emc.com
Mon Oct 26 20:55:15 CET 2009
Hi all,
First off, sorry for the long post, I'm trying to supply as much data as
possible for analysis.
I have a single Hobbit server with approximately 3500 hosts, a mixture
of windows and unix, some DB tests,
some BEA tests and a few custom tests. I have over 70000 RRD files
which seems to be causing Hobbit performance
problems, most specifcally clock offset. I have a cron job that
restarts Hobbit every 30 minutes otherwise the offset
grows so large it eats all memory and OOM kill starts. NTP is fine, it
seems to be the time it takes for Hobbit to process
the client data. OS resides on RAID1 146GB drives SAS 15K RPM, second
drive for RRDs is a single 300GB SAS 15K RPM.
At the end is a graph showing the clock offset. What else can I try?
I moved the RRDs off to a separate drive hoping this would help, but the
write per second is high. I've tried reducing
read-ahead, mounting noatime,nodiratime, changing IO scheduling to
deadline, nothing seems to help. Here's a
sample output from iostat -xd 60 10:
Device:
rrqm/s
wrqm/s
r/s
w/s
rsec/s
wsec/s
rkB/s
wkB/s
avgrq-sz
avgqu-sz
await
svctm
%util
sda
0.00
68.08
0.17
20.02
1.33
704.78
0.67
352.39
34.98
4.25
210.36
3.47
7.01
sda1
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
sda2
0.00
68.08
0.17
20.02
1.33
704.78
0.67
352.39
34.98
4.25
210.36
3.47
7.01
sdb
0.00
674.60
1.53
311.04
12.27
7887.05
6.13
3943.52
25.27
24.50
78.38
1.91
59.70
sdb1
0.00
674.60
1.53
311.04
12.27
7887.05
6.13
3943.52
25.27
24.50
78.38
1.91
59.70
sdb2
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
dm-0
0.00
0.00
0.17
88.10
1.33
704.78
0.67
352.39
8.00
20.31
230.09
0.79
7.01
dm-1
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Drive sdb1 is housing the RRD files
Memory seems fine:
Memory Used Total Percentage
Physical 7645M 7973M 95%
Actual 4688M 7973M 58%
Swap 64M 9983M 0%
[hobbit at hobbitmon rrd]$ uname -a
Linux hobbitmon 2.6.9-78.0.8.ELsmp #1 SMP Wed Nov 5 07:14:58 EST 2008
x86_64 x86_64 x86_64 GNU/Linux
[hobbit at hobbitmon rrd]$ cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
Output from bbgen:
bbgen for Hobbit version 4.2.0
Statistics:
Hosts : 3506
Status messages : 41934
Purple messages : 0
Pages : 171
Output from bbtest:
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7a Feb 19 2003
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 3511
Hosts with no tests : 2390
Total test count : 1470
Status messages : 1596
Alert status msgs : 0
Transmissions : 18
DNS statistics:
# hostnames resolved : 358
# succesful : 339
# failed : 19
# calls to dnsresolve : 530
TCP test statistics:
# TCP tests total : 411
# HTTP tests : 161
# Simple TCP tests : 250
# Connection attempts : 411
# bytes written : 24722
# bytes read : 543706
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1256584823.382254
-
Service definitions loaded 1256584823.383506
0.001252
Tests loaded 1256584823.468743
0.085237
DNS lookups completed 1256584828.565010
5.096267
Test engine setup completed 1256584828.572444
0.007434
TCP tests completed 1256584839.000192
10.427748
PING test completed (1082 hosts) 1256584881.612835
42.612643
PING test results sent 1256584890.617168
9.004333
Test result collection completed 1256584890.617453
0.000285
LDAP test engine setup completed 1256584890.617453
0.000000
LDAP tests executed 1256584890.617454
0.000001
LDAP tests result collection completed 1256584890.617455
0.000001
NTP tests executed 1256584894.477007
3.859552
RPC tests executed 1256584894.988810
0.511803
Test results transmitted 1256584895.016358
0.027548
bbtest-net completed 1256584895.018441
0.002083
TIME TOTAL
71.636187
Output for hobbitd:
Statistics for Hobbit daemon
Up since 26-Oct-2009 15:00:11 (0 days, 00:25:02)
Incoming messages : 398039
- status : 367373
- combo : 5193
- page : 183
- summary : 75
- data : 15310
- client : 9595
- notes : 0
- enable : 0
- disable : 0
- ack : 0
- config : 0
- query : 50
- hobbitdboard : 63
- hobbitdlog : 180
- drop : 0
- rename : 0
- dummy : 5
- ping : 0
- notify : 0
- schedule : 1
- download : 0
- Bogus/Timeouts : 11
Incoming messages/sec : 262 (average last 300 seconds)
status channel messages: 366410 (1 readers)
stachg channel messages: 34214 (1 readers)
page channel messages: 5600 (1 readers)
data channel messages: 15310 (1 readers)
notes channel messages: 0 (0 readers)
enadis channel messages: 0 (0 readers)
client channel messages: 9565 (1 readers)
clichg channel messages: 17 (1 readers)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20091026/8c596144/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 118 bytes
Desc: image001.gif
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20091026/8c596144/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 29623 bytes
Desc: image002.png
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20091026/8c596144/attachment.png>
More information about the Xymon
mailing list