[Xymon] "Discarding timed-out partial msg" Error Messages

J.C. Cleaver cleaver at terabithia.org
Tue Dec 8 16:50:06 CET 2015


Hi Matt,

Sorry for the delay. Had some unexpected time away from the keyboard this
weekend.

Responses inline.


On Fri, December 4, 2015 3:49 am, Matt Vander Werf wrote:
> Hi J.C.,
>
> Thanks for the e-mail and advice!
>
> A couple of questions:
>
> What's the default --lqueue value that Xymon uses? (Is there a way to see
> what it's using?)
>
> What exactly is your definition of "tons of simultaneous connections"
> here?
> Can you give me a number or range that you think would warrant increasing
> the --lqueue value?


The default is 512, which is compiled in. This really won't need to be
increased unless xymond is being bogged down with lots of *literally*
simultaneous waiting connections. It can be increased, but there's
probably another sort of problem happening: either slow connectivity, high
CPU load, or "backpressure" from too many channel workers causing xymond
itself to be unable to keep up. I'm trying to think back and I don't think
I had cause to increase it until SN was regularly hitting the 2500 msgs/s
range, and it was lowered back down once other performance bottlenecks and
some packet loss were identified.

Try stracing xymond and seeing what it's doing. If there's a lot of
waiting happening for network reading, that might be a sign that lqueue
increasing could help. 768 or 1024 should be more than sufficient.
Anything more than that except at bursts means there's some other backlog.


>
> Could it be from clients/senders with longer than usual process listings?
> Or other clientlog statistics? (But still under the max client message
> value.)

It's possible, but unless you're bandwidth restricted somewhere senders
should generally still be able to complete in the default time frame. If
you *are* bandwith restricted then that's definitely something to
consider, especially if the machines you're having problems with have a
lot of burst network activity. (Speaking of burst network activity, try
commenting out the 'netstat' output in the client if you don't have any
port checks against the host.)


>
> How would I be able to tell if there are long messages being sent in if
> the
> long messages are being discarded?

Yeah, this should probably be added in. Truncated messages have their
first line displayed, but it's not so much a 'discard' here as it is a
network timeout first and foremost.

An strace with the -s 4096 (or some high number) might be able to catch
the first bit of a read from the client if you're lucky there...


HTH,
-jc




More information about the Xymon mailing list