<div dir="ltr"><div dir="ltr">On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <<a href="mailto:zechner@vrvis.at">zechner@vrvis.at</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It seems I celebrated prematurely, the errors are back in exactly the <br>

same way :-/<br>

<br>

2022-03-08 08:47:19.321457 Got over-size message, truncating at 528383 <br>

bytes (max: 524288)<br>

2022-03-08 08:47:19.339786 Dropping (more) garbled data<br>

<br>

I don't understand where this limit 05 512 comes from, everything on the <br>

server checks out (2048 before, tried 4096 as well, no change).<br></blockquote><div><br></div><div>I'm at a loss. If the xymond process is proven to have this set at 2048, then I see no reason why it would give that error message with that number.</div><div><br></div><div>Unless it's referring to another message type and hence a different maximum setting? Perhaps take a look at xymond's environment again, but search for all MAXMSG_ variables. See which one is set to 512, and that might be the culprit. The defaults for these max values are all different, with only two of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one of them has been set to 512.</div><div><br></div><div>The only other thing I can think of is that you have two copies of xymond running, somehow with different values of MAXMSG_CLIENT. But I can't think how this could come about. And you've already killed off any rogue processes.</div><div><br></div><div>Maybe run xymond in debug mode for one round of updates, until you get the "Got over-size message" and review the debug logs. This might provide enough additional detail to find out what's going on.</div><div><br></div><div>Another approach to solve the problem (truncated client data message) is to modify the client script (eg xymonclient-linux.sh) to truncate the ps command output, so that the total message size is less, and hopefully fits within the max message size. This will mean that PROC checks might not work anymore (which is likely the case now). But the current state is that monitoring of the sections that come after [ps] are likely broken now. On Linux this is notably the [top] and [vmstat] sections of the client data message, that are used for the "cpu" status and several metrics for graphing. Maybe something like adding "head -1000" will cut it down to a reasonable size:</div><div><br></div><div><font face="monospace">echo "[ps]"<br>ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head -1000</font><br><br></div><div>Also, review the client data message before the [ps] section to see if there's actually something else pushing it over the limit, and [ps] just happens to be where the truncation happens.</div><div><br></div><div>J</div><div><br></div></div></div>