[Xymon] XYMON Proxy Issue

Andy Smith abs at shadymint.com
Tue May 20 08:28:46 CEST 2014


Andy Smith wrote:
> Andy Smith wrote:
>> Hi,
>>
>> In February, Gautier reported this issue with xymonproxy on Solaris :-
>>
>> http://lists.xymon.com/pipermail/xymon/2014-February/039160.html
>>
>> I have come this week to update an installation of 4.2.3 on Solaris 9 
>> and have encountered the exact same issue as Gautier, but this time on 
>> the latest 4.3.17 code :-
>>
>> 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
>> 2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>
>> 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
>> 2014-05-04 13:20:41 select() failed: Invalid argument
>> 2014-05-04 13:20:41 select() failed: Invalid argument
>> 2014-05-04 13:20:41 select() failed: Invalid argument
>> 2014-05-04 13:20:41 select() failed: Invalid argument
>> 2014-05-04 13:20:41 select() failed: Invalid argument
>> 2014-05-04 13:20:41 select() failed: Invalid argument
>> 2014-05-04 13:20:41 Too many select failures, aborting
>> 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting
>>
>> I do not suffer the connections in TIME_WAIT, just the constant 
>> restarting of the proxy every 15 minutes.  Here is the truss as it 
>> gasps when falling over :-
>>
>> poll(0xFFBFF208, 1, 1000)                       = 0
>> time()                                          = 1399206937
>> poll(0xFFBFF208, 1, 1000)                       = 0
>> time()                                          = 1399206938
>> poll(0xFFBFF208, 1, 1000)                       = 0
>> time()                                          = 1399206939
>> poll(0xFFBFF208, 1, 1000)                       = 0
>> time()                                          = 1399206940
>> poll(0xFFBFF208, 1, 1000)                       = 0
>> time()                                          = 1399206941
>> poll(0xFFBFF208, 1, 1000)                       = 0
>> time()                                          = 1399206942
>> poll(0xFFBFF208, 1, 1000)                       = 1
>> accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
>> fcntl(4, F_SETFL, 0x00000080)                   = 0
>> time()                                          = 1399206942
>> poll(0xFFBFF200, 2, 1000)                       = 1
>> read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
>> time()                                          = 1399206942
>> poll(0xFFBFF200, 2, 1000)                       = 1
>> read(4, 0x00038CE2, 8045)                       = 0
>> time()                                          = 1399206942
>> shutdown(4, 2, 1)                               = 0
>> close(4)                                        = 0
>> poll(0xFFBFF208, 1, 1000)                       = 1
>> accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
>> fcntl(4, F_SETFL, 0x00000080)                   = 0
>> time()                                          = 1399206942
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " s e l e c t ( )   f a i".., 34)      = 34
>> time()                                          = 1399206942
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " s e l e c t ( )   f a i".., 34)      = 34
>> time()                                          = 1399206942
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " s e l e c t ( )   f a i".., 34)      = 34
>> time()                                          = 1399206942
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " s e l e c t ( )   f a i".., 34)      = 34
>> time()                                          = 1399206942
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " s e l e c t ( )   f a i".., 34)      = 34
>> time()                                          = 1399206942
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " s e l e c t ( )   f a i".., 34)      = 34
>> time()                                          = 1399206942
>> write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
>> write(2, "  ", 1)                               = 1
>> write(2, " T o o   m a n y   s e l".., 35)      = 35
>> _exit(1)
>>
>> So, question to Gautier, are you using Solaris 9 and have you managed 
>> to resolve this?
>>
>> Another question to the rest of the list, this is actually the only 
>> proxy I have on Solaris, all the otehrs are on Redhat, is anyone else 
>> using xymonproxy on Solaris and if so, what version?  For the time 
>> being, I am running the old bbproxy until I get this fixed, the rest 
>> of 4.3.17 seems to be working OK.
> 
> Done a bit more digging around.  Firstly, if I regress to r#7368 
> (4.3.13) then xymonproxy on Solaris is stable.  This just hides the 
> problem of course and might be a factor in Gautier's performance issue.
> 
> If I modify the code for 4.3.17 to remove the exit after 5 select() 
> failures and add in some further debugging, I can observe that on 
> Solaris 9 at least :-
> 
> - every 900 seconds, select() fails
> - select continues to fail for 2 seconds then succeeds and the proxy 
> continues as normal.
> - during these 2 seconds, there are no further calls to poll(), but 
> somewhere in the region of 50,000 calls to time().
> - the values for the selecttmo structure and maxfd are reasonable, so 
> the invalid argument must be one of the fdread or fdwrite structures.
> 
> Continuing to collect information but still not sure if I am looking at 
> a Sol9 issue or if this affects later Solaris versions.

This issue affected Solaris 10 as well, the attached patch resolves all 
my xymonproxy stability problems on Solaris platforms, I believe the 
patch is relevant to other platforms also, just that the select() on 
other platforms is more tolerant.

-- 
Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xymon-4.3.17-patch.tar.gz
Type: application/x-gzip
Size: 332 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20140520/2ccf9cea/attachment.bin>


More information about the Xymon mailing list