[Xymon] ports/procs flapping
Japheth Cleaver
cleaver at terabithia.org
Fri Nov 18 19:06:16 CET 2016
On 11/17/2016 1:36 PM, Torsten Richter wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Charles,
>
> in the webinterface of Xymon you have for the most tests
> a link called "Client data" below the status.
> If you click on it you'll see a text output in your browser
> that starts with something like "[collector:]".
> Scroll down to the end of the file and see if the last entry
> is something about "[clock]" and "epoch", "local" and "UTC".
>
> If not then maybe the data sent from the client to the server
> got truncated and you might see a yellow alert for xymond.
> Or you find something about it in you Xymon server logs.
>
> I had similar problems with some hosts where a lot of connections
> were in state "ESTABLISHED" or "TIME_WAIT" that filled up the
> data file and on server side some MAXMSG* parameter is set too
> low. You'll have to adjust that parameter and restart the Xymon server
> component.
>
> HTH
> Torsten
>
> On 17.11.2016 22:00, charles slater via Xymon wrote:
>> _______________________________________________
>>
The data truncation thing definitely could be a cause here. An
unfortunate artifact of the communication mechanism with the original
TCP protocol is the lack of a message end delimiter on submission (it's
present in the STDIN stream from xymond_channel to the workers, but by
that point the original transmission has long-since concluded), meaning
we can't tell for sure if we got the entire payload. This is fixed in
the "V5" protocol (former trunk) by adding a payload size prefix, but
this isn't backwards compatible for anything not expecting it. It's also
(almost-certainly) fixed if you're using any compression, because size
details are used as part of decompression validation.
There are number of TCP and connection kernel parameters that can be
tweaked in sysctl.conf to help reduce TCP issues here -- especially
important if you're running a busy xymonnet server -- but an actual
overloaded router or flakey cable (or VPN connection) will still bite you.
One workaround -- if you can spare the CPU capacity and message overhead
on your xymon server -- is to add '--filter='\\[clock\\]'' to anything
listening on the client channel in tasks.cfg. For example:
CMD xymond_channel --filter='\\[clock\\]' --channel=client
xymond_client --uptime-status
This will reject any incoming client message that got truncated before
the [clock] section at the very end, so it won't get processed into the
individual status messages based on missing data. not the best solution,
but it will at least prevent status flaps.
And yes, check your xymond.log for any warnings about truncated messages
due to size. It's always good to give yourself a lot of extra room in
the client message if you have servers reporting in that receive lots of
burst network or process activity where either netstat or ps could end
up 1000's of lines longer than normal.
HTH,
-jc
More information about the Xymon
mailing list