[Xymon] localmode, got over-size message, truncating

Wed Mar 9 00:04:54 CET 2022

On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at> wrote:

> It seems I celebrated prematurely, the errors are back in exactly the
> same way :-/
>
> 2022-03-08 08:47:19.321457 Got over-size message, truncating at 528383
> bytes (max: 524288)
> 2022-03-08 08:47:19.339786 Dropping (more) garbled data
>
> I don't understand where this limit 05 512 comes from, everything on the
> server checks out (2048 before, tried 4096 as well, no change).
>

I'm at a loss. If the xymond process is proven to have this set at 2048,
then I see no reason why it would give that error message with that number.

Unless it's referring to another message type and hence a different maximum
setting? Perhaps take a look at xymond's environment again, but search for
all MAXMSG_ variables. See which one is set to 512, and that might be the
culprit. The defaults for these max values are all different, with only two
of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference:
lib/xymond_buffer.c). But it's possible one of them has been set to 512.

The only other thing I can think of is that you have two copies of xymond
running, somehow with different values of MAXMSG_CLIENT. But I can't think
how this could come about. And you've already killed off any rogue
processes.

Maybe run xymond in debug mode for one round of updates, until you get the
"Got over-size message" and review the debug logs. This might provide
enough additional detail to find out what's going on.

Another approach to solve the problem (truncated client data message) is to
modify the client script (eg xymonclient-linux.sh) to truncate the ps
command output, so that the total message size is less, and hopefully fits
within the max message size. This will mean that PROC checks might not work
anymore (which is likely the case now). But the current state is that
monitoring of the sections that come after [ps] are likely broken now. On
Linux this is notably the [top] and [vmstat] sections of the client data
message, that are used for the "cpu" status and several metrics for
graphing. Maybe something like adding "head -1000" will cut it down to a
reasonable size:

echo "[ps]"
ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head
-1000

Also, review the client data message before the [ps] section to see if
there's actually something else pushing it over the limit, and [ps] just
happens to be where the truncation happens.

J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20220309/46220b18/attachment.htm>