[Xymon] localmode, got over-size message, truncating

Christoph Zechner zechner at vrvis.at
Fri Mar 11 06:19:04 CET 2022


On 10/03/2022 12:41, Jeremy Laidman wrote:
> Great work Christoph.
> 
> Sorry, it appears that I led you down the wrong path, asserting that it 
> was a server-only setting in xymond. It would appear to be a client-side 
> setting. This seems to be undocumented in the man page for xymonclient.cfg.

Please, no worries, you steered me into the right direction and 
increasing the message sizes on the server was not a bad idea anyhow. :-)

But yes, this is undocumented unfortunately. I already filed a bug 
report with the Debian maintainers, let's see what comes of it.

Christoph

> 
> J
> 
> On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <zechner at vrvis.at 
> <mailto:zechner at vrvis.at>> wrote:
> 
>     I solved it!
> 
>     I had to add and set "MAXMSG_CLIENT=1024" in
>     /etc/xymon/xymonclient.cfg,
>     restarted xymon-client and all the errors were gone.
> 
>     Thanks again for your help!
> 
>     Cheers
>     Christoph
> 
> 
>     On 09/03/2022 06:42, Christoph Zechner wrote:
>      > On 09/03/2022 00:04, Jeremy Laidman wrote:
>      >> On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at
>     <mailto:zechner at vrvis.at>
>      >> <mailto:zechner at vrvis.at <mailto:zechner at vrvis.at>>> wrote:
>      >>
>      >>     It seems I celebrated prematurely, the errors are back in
>     exactly the
>      >>     same way :-/
>      >>
>      >>     2022-03-08 08:47:19.321457 Got over-size message, truncating at
>      >> 528383
>      >>     bytes (max: 524288)
>      >>     2022-03-08 08:47:19.339786 Dropping (more) garbled data
>      >>
>      >>     I don't understand where this limit 05 512 comes from,
>     everything on
>      >>     the
>      >>     server checks out (2048 before, tried 4096 as well, no change).
>      >>
>      >>
>      >> I'm at a loss. If the xymond process is proven to have this set at
>      >> 2048, then I see no reason why it would give that error message
>     with
>      >> that number.
>      >>
>      >> Unless it's referring to another message type and hence a different
>      >> maximum setting? Perhaps take a look at xymond's environment again,
>      >> but search for all MAXMSG_ variables. See which one is set to
>     512, and
>      >> that might be the culprit. The defaults for these max values are
>     all
>      >> different, with only two of them defaulting to 512: MAXMSG_CLIENT,
>      >> MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's
>     possible one
>      >> of them has been set to 512.
>      >
>      > Thanks, I tried that, but unfortunately, this did not help, since
>     all
>      > the values were set correctly, according to my config.
>      >
>      >>
>      >> The only other thing I can think of is that you have two copies of
>      >> xymond running, somehow with different values of MAXMSG_CLIENT.
>     But I
>      >> can't think how this could come about. And you've already killed
>     off
>      >> any rogue processes.
>      >
>      > Right, that's not it either. :-/
>      >
>      >>
>      >> Maybe run xymond in debug mode for one round of updates, until
>     you get
>      >> the "Got over-size message" and review the debug logs. This might
>      >> provide enough additional detail to find out what's going on.
>      >>
>      >> Another approach to solve the problem (truncated client data
>     message)
>      >> is to modify the client script (eg xymonclient-linux.sh) to
>     truncate
>      >> the ps command output, so that the total message size is less, and
>      >> hopefully fits within the max message size. This will mean that
>     PROC
>      >> checks might not work anymore (which is likely the case now).
>     But the
>      >> current state is that monitoring of the sections that come after
>     [ps]
>      >> are likely broken now. On Linux this is notably the [top] and
>     [vmstat]
>      >> sections of the client data message, that are used for the "cpu"
>      >> status and several metrics for graphing. Maybe something like
>     adding
>      >> "head -1000" will cut it down to a reasonable size:
>      >>
>      >> echo "[ps]"
>      >> ps -Aww -o
>     pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd |
>      >> head -1000
>      >
>      > That's actually a gread idea and I modified the [ports] section,
>     because
>      > I know this is the culprit (running a proxy there and all the active
>      > client connections were too much for xymon to handle.
>      >
>      > I'm not interested in client connections anyway, I just want to
>     monitor
>      > my running programs and ports on that server, so I replaced the
>     original
>      >
>      > netstat -antuW 2>/dev/null
>      > netstat -antuT 2>/dev/null
>      >
>      > with
>      >
>      > netstat -tulpenW 2>/dev/null
>      >
>      > (adding your "| head 1000" suggestion did not work, because it
>     cut off
>      > the list before it could reach the IPv6 interfaces and thus the
>     ports
>      > check was always red).
>      >
>      > Now xymon works again, although this is just a workaround,
>     because the
>      > underlying problem of where exactly my messages got truncated, is
>     still
>      > to be found, but I can live with this solution.
>      >
>      > Anyway, I very much appreciate your time and efforts, thank you
>     very much!
>      >
>      > Cheers
>      > Christoph
>      >
>      >>
>      >> Also, review the client data message before the [ps] section to
>     see if
>      >> there's actually something else pushing it over the limit, and [ps]
>      >> just happens to be where the truncation happens.
>      >>
>      >> J
>      >>
>      > _______________________________________________
>      > Xymon mailing list
>      > Xymon at xymon.com <mailto:Xymon at xymon.com>
>      > http://lists.xymon.com/mailman/listinfo/xymon
>     <http://lists.xymon.com/mailman/listinfo/xymon>
> 


More information about the Xymon mailing list