[Xymon] XYMON Proxy Issue

Andy Smith abs at shadymint.com
Mon May 5 11:52:06 CEST 2014


Gautier Begin wrote:
> Andy,
> 
> I'm using Solaris 10.5 in a cluster zone configuration. Both the main 
> and the proxy server. I have also a little proxy under Linux Ubuntu.
> XYMON version 4.3.12
> 
> Now, my proxy under Solaris is working fine with ~900 targets. Here are 
> the different stepsI have done:
> 
> *0- Use a tool to observe the behaviour of the network* on the system. I 
> used netstat on the zone and lsof -i :1984 on the global zone (physical 
> node of the cluster)
> 
>  Here my perl script to be run on the zone (netstat):
> 
> /$total = 0 ;/
> /$big_total = 0 ;/
> /@netstat = ` netstat -naP tcp ` ;/
> /my %Con_Status ;/
> /my %Con_Status_Total ;/
> /foreach $ln (@netstat)/
> /{/
> /        chomp($ln) ;/
> /        @elts = split(/ +/,$ln) ;/
> /        if (( $#elts > 5 ) && ( $ln =~ /[0-9]+.*[A-Z]+/))/
> /        {/
> /                 $big_total++ ;/
> /                 unless ( exists($Con_Status_Total{$elts[$#elts]}) )/
> /                {/
> /                        $Con_Status_Total{$elts[$#elts]} = 1 ;/
> /                } else {/
> /                        $Con_Status_Total{$elts[$#elts]} = 
> $Con_Status_Total{$elts[$#elts]} + 1 ;/
> /                }/
> 
> /        }/
> 
> /        if ( $ln =~ /\.1984 +/ )/
> /        {/
> 
> /                unless ( exists($Con_Status{$elts[$#elts]}) )/
> /                {/
> /                        $Con_Status{$elts[$#elts]} = 1 ;/
> /                } else {/
> /                        $Con_Status{$elts[$#elts]} = 
> $Con_Status{$elts[$#elts]} + 1 ;/
> /                }/
> 
> /        }/
> 
> 
> /}/
> 
> 
> /print " State\t\tPort 
> 1984\tTotal\n=======================================\n" ;/
> /foreach $Conn_State (sort keys %Con_Status_Total )/
> /{/
> /         unless ( exists($Con_Status{$Conn_State}) ) { 
> $Con_Status{$Conn_State} = 0 ; }/
> /        if ( length($Conn_State) < 7 ) { $col = "\t\t" ; } else { $col 
> = "\t"  ; }/
> /        print " 
> $Conn_State$col$Con_Status{$Conn_State}\t\t$Con_Status_Total{$Conn_State}\n" 
> ;/
> /        $total = $total + $Con_Status{$Conn_State} ;/
> /}/
> /print "=======================================\n 
> TOTAL\t\t$total\t\t$big_total\n" ;/
> 
> 
> 
> *1- Tune and configure how Solaris manages the network *using the ndd 
> command:
> 
> /ndd -set /dev/tcp tcp_time_wait_interval        2000/
> /ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500/
> /ndd -set /dev/tcp tcp_ip_abort_interval         300000/
> /ndd -set /dev/tcp tcp_keepalive_interval        7200000/
> /ndd -set /dev/tcp tcp_rexmit_interval_max       4000/
> /ndd -set /dev/tcp tcp_rexmit_interval_min       3000/
> /ndd -set /dev/tcp tcp_rexmit_interval_initial   3000/
> /ndd -set /dev/tcp tcp_smallest_anon_port        1024/
> 
> /ndd -set /dev/tcp tcp_conn_req_max_q    2048/
> /ndd -set /dev/tcp tcp_conn_req_max_q0   4096/
> /ndd -set /dev/tcp tcp_slow_start_initial        4/
> 
> /ndd -set /dev/tcp tcp_xmit_hiwat        262144/
> /ndd -set /dev/tcp tcp_recv_hiwat        262144/
> /ndd -set /dev/tcp tcp_max_buf   1048576/
> 
> 
> 
> *2- Modify the program xymonproxy.c*
> 
> As I previously said, sockets are not well handled in this program 
> (closure not managed). Because I know very few about C programming, I 
> just "arranged" the program, but it's remain a dirty solution.
> => so_linger, setsockopt part
> 
> I modified also line 973 and following because of verbose logging 
> slowing done the proxy (select failed message). The best should be to 
> solve to issue but I didn't.
> 
> /# diff xymonproxy.c xymonproxy.c.ORIG/
> /230d229/
> /<         struct linger so_linger;/
> /715,717d713/
> /<                                       so_linger.l_onoff = 0;/
> /<                               so_linger.l_linger = 10;/
> /<                               setsockopt(cwalk->ssocket, SOL_SOCKET, 
> SO_LINGER, &so_linger, sizeof(so_linger));/
> 
> /977,981c973,976/
> /< /*            if (n < 0) {                                           
>                          *//
> /< /*                    errprintf("select() %d/%d failed: %s\n", n, 
> maxfd, strerror(errno));    *//
> /< /*            }                                                       
>                         *//
> /< /*            else if (n == 0) {                                     
>                          *//
> /<               if (n == 0) {/
> /---/
> />               if (n < 0) {/
> />                       errprintf("select() failed: %s\n", 
> strerror(errno));/
> />               }/
> />               else if (n == 0) {/
> /1001c996/
> /<               else if ( n > 0 ) {/
> /---/
> />               else {/
> 
> 
> 
> *3- XYMON proxy conf*
> 
> Because of the large amount of targets:
> 
> In xymonserver.cfg, of the proxy, I put MAXMSGSPERCOMBO="500" .
> 
> In the xymonserver.cfg, of the main server, I put
> 
> MAXMSGSPERCOMBO="500"
> 
> MAXLINE="5242880"
> MAXMSG_CLIENT="5242880"
> MAXMSG_DATA="5242880"
> MAXMSG_STACHG="5242880"
> MAXMSG_STATUS="5242880"
> MAXMSG_NOTES="5242880"
> MAXMSG_PAGE="5242880"
> MAXMSG_ENADIS="5242880"
> MAXMSG_CLICHG="5242880"
> 
> 
> This part is not realy tunned (figures should be too large) but it's 
> working.
> 
> 
> Cordialement, Regards,Mit freundlichen Grüßen,
> 
> Gautier BEGIN
> 
> System Tools Team Lead
> CACEIS and APERAM accounts
> CSC Computer Sciences Luxembourg S.A.
> 12D Impasse Drosbach
> L-1882 Luxembourg
> 
> Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | 
> gbegin at csc.com | www.csc.com
> 
> 
> CSC • This is a PRIVATE message. If you are not the intended recipient, 
> please delete without copying and kindly advise us by e-mail of the 
> mistake in delivery.  NOTE: Regardless of content, this e-mail shall not 
> operate to bind CSC to any order or other contract unless pursuant to 
> explicit written agreement or government initiative expressly permitting 
> the use of e-mail for such purpose
>> CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 
> Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in 
> France: RCS Nanterre B 315 268 664
> 
> 
> 
> From:        Andy Smith <abs at shadymint.com>
> To:        xymon at xymon.com
> Date:        05/04/2014 02:50 PM
> Subject:        Re: [Xymon] XYMON Proxy Issue
> Sent by:        "Xymon" <xymon-bounces at xymon.com>
> ------------------------------------------------------------------------
> 
> 
> 
> Hi,
> 
> In February, Gautier reported this issue with xymonproxy on Solaris :-
> _
> __http://lists.xymon.com/pipermail/xymon/2014-February/039160.html_
> 
> I have come this week to update an installation of 4.2.3 on Solaris 9 
> and have encountered the exact same issue as Gautier, but this time on 
> the latest 4.3.17 code :-
> 
> 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
> 2014-05-04 13:20:41 Listening on _0.0.0.0:1984_ <http://0.0.0.0:1984/>
> 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 Too many select failures, aborting
> 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting
> 
> I do not suffer the connections in TIME_WAIT, just the constant 
> restarting of the proxy every 15 minutes.  Here is the truss as it gasps 
> when falling over :-
> 
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206937
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206938
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206939
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206940
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206941
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206942
> poll(0xFFBFF208, 1, 1000)                       = 1
> accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
> fcntl(4, F_SETFL, 0x00000080)                   = 0
> time()                                          = 1399206942
> poll(0xFFBFF200, 2, 1000)                       = 1
> read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
> time()                                          = 1399206942
> poll(0xFFBFF200, 2, 1000)                       = 1
> read(4, 0x00038CE2, 8045)                       = 0
> time()                                          = 1399206942
> shutdown(4, 2, 1)                               = 0
> close(4)                                        = 0
> poll(0xFFBFF208, 1, 1000)                       = 1
> accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
> fcntl(4, F_SETFL, 0x00000080)                   = 0
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " T o o   m a n y   s e l".., 35)      = 35
> _exit(1)
> 
> So, question to Gautier, are you using Solaris 9 and have you managed to 
> resolve this?
> 
> Another question to the rest of the list, this is actually the only 
> proxy I have on Solaris, all the otehrs are on Redhat, is anyone else 
> using xymonproxy on Solaris and if so, what version?  For the time 
> being, I am running the old bbproxy until I get this fixed, the rest of 
> 4.3.17 seems to be working OK.
> 
> Thanks for any feedback.
> -- 
> Andy

Gautier,

My issue is not a matter of performance or resource, I have only 3 
servers in this DMZ, but thanks for the complete information.  Also, it 
is a concern that this still happens with recent versions of Solaris, I 
would be prepared to accept that Solaris 9 might behave incorrectly but 
I would have hoped that Solaris 10 might have fixed this.

Maybe I will go back to the differences between the code for bbproxy at 
4.2.3 and xymonproxy at 4.3.17 for a clue as to what is going on.

-- 
Andy



More information about the Xymon mailing list