[Xymon] XYMON Proxy Issue

Andy Smith abs at shadymint.com
Sun May 11 22:03:11 CEST 2014


Andy Smith wrote:
> Hi,
> 
> In February, Gautier reported this issue with xymonproxy on Solaris :-
> 
> http://lists.xymon.com/pipermail/xymon/2014-February/039160.html
> 
> I have come this week to update an installation of 4.2.3 on Solaris 9 
> and have encountered the exact same issue as Gautier, but this time on 
> the latest 4.3.17 code :-
> 
> 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
> 2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>
> 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 select() failed: Invalid argument
> 2014-05-04 13:20:41 Too many select failures, aborting
> 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting
> 
> I do not suffer the connections in TIME_WAIT, just the constant 
> restarting of the proxy every 15 minutes.  Here is the truss as it gasps 
> when falling over :-
> 
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206937
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206938
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206939
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206940
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206941
> poll(0xFFBFF208, 1, 1000)                       = 0
> time()                                          = 1399206942
> poll(0xFFBFF208, 1, 1000)                       = 1
> accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
> fcntl(4, F_SETFL, 0x00000080)                   = 0
> time()                                          = 1399206942
> poll(0xFFBFF200, 2, 1000)                       = 1
> read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
> time()                                          = 1399206942
> poll(0xFFBFF200, 2, 1000)                       = 1
> read(4, 0x00038CE2, 8045)                       = 0
> time()                                          = 1399206942
> shutdown(4, 2, 1)                               = 0
> close(4)                                        = 0
> poll(0xFFBFF208, 1, 1000)                       = 1
> accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
> fcntl(4, F_SETFL, 0x00000080)                   = 0
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " s e l e c t ( )   f a i".., 34)      = 34
> time()                                          = 1399206942
> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
> write(2, "  ", 1)                               = 1
> write(2, " T o o   m a n y   s e l".., 35)      = 35
> _exit(1)
> 
> So, question to Gautier, are you using Solaris 9 and have you managed to 
> resolve this?
> 
> Another question to the rest of the list, this is actually the only 
> proxy I have on Solaris, all the otehrs are on Redhat, is anyone else 
> using xymonproxy on Solaris and if so, what version?  For the time 
> being, I am running the old bbproxy until I get this fixed, the rest of 
> 4.3.17 seems to be working OK.

Done a bit more digging around.  Firstly, if I regress to r#7368 
(4.3.13) then xymonproxy on Solaris is stable.  This just hides the 
problem of course and might be a factor in Gautier's performance issue.

If I modify the code for 4.3.17 to remove the exit after 5 select() 
failures and add in some further debugging, I can observe that on 
Solaris 9 at least :-

- every 900 seconds, select() fails
- select continues to fail for 2 seconds then succeeds and the proxy 
continues as normal.
- during these 2 seconds, there are no further calls to poll(), but 
somewhere in the region of 50,000 calls to time().
- the values for the selecttmo structure and maxfd are reasonable, so 
the invalid argument must be one of the fdread or fdwrite structures.

Continuing to collect information but still not sure if I am looking at 
a Sol9 issue or if this affects later Solaris versions.
-- 
Andy



More information about the Xymon mailing list