[Xymon] ports/procs flapping

Japheth Cleaver cleaver at terabithia.org
Fri Nov 18 19:06:16 CET 2016


On 11/17/2016 1:36 PM, Torsten Richter wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Charles,
>
> in the webinterface of Xymon you have for the most tests
> a link called "Client data" below the status.
> If you click on it you'll see a text output in your browser
> that starts with something like "[collector:]".
> Scroll down to the end of the file and see if the last entry
> is something about "[clock]" and "epoch", "local" and "UTC".
>
> If not then maybe the data sent from the client to the server
> got truncated and you might see a yellow alert for xymond.
> Or you find something about it in you Xymon server logs.
>
> I had similar problems with some hosts where a lot of connections
> were in state "ESTABLISHED" or "TIME_WAIT" that filled up the
> data file and on server side some MAXMSG* parameter is set too
> low. You'll have to adjust that parameter and restart the Xymon server
> component.
>
> HTH
> Torsten
>
> On 17.11.2016 22:00, charles slater via Xymon wrote:
>> _______________________________________________
>>

The data truncation thing definitely could be a cause here. An 
unfortunate artifact of the communication mechanism with the original 
TCP protocol is the lack of a message end delimiter on submission (it's 
present in the STDIN stream from xymond_channel to the workers, but by 
that point the original transmission has long-since concluded), meaning 
we can't tell for sure if we got the entire payload. This is fixed in 
the "V5" protocol (former trunk) by adding a payload size prefix, but 
this isn't backwards compatible for anything not expecting it. It's also 
(almost-certainly) fixed if you're using any compression, because size 
details are used as part of decompression validation.

There are number of TCP and connection kernel parameters that can be 
tweaked in sysctl.conf to help reduce TCP issues here -- especially 
important if you're running a busy xymonnet server -- but an actual 
overloaded router or flakey cable (or VPN connection) will still bite you.

One workaround -- if you can spare the CPU capacity and message overhead 
on your xymon server -- is to add '--filter='\\[clock\\]'' to anything 
listening on the client channel in tasks.cfg. For example:

     CMD xymond_channel --filter='\\[clock\\]' --channel=client 
xymond_client --uptime-status

This will reject any incoming client message that got truncated before 
the [clock] section at the very end, so it won't get processed into the 
individual status messages based on missing data. not the best solution, 
but it will at least prevent status flaps.

And yes, check your xymond.log for any warnings about truncated messages 
due to size. It's always good to give yourself a lot of extra room in 
the client message if you have servers reporting in that receive lots of 
burst network or process activity where either netstat or ps could end 
up 1000's of lines longer than normal.


HTH,
-jc



More information about the Xymon mailing list