[Xymon] localmode, got over-size message, truncating

Christoph Zechner zechner at vrvis.at
Fri Mar 11 06:25:14 CET 2022


On 10/03/2022 23:56, Jeremy Laidman wrote:
> Honestly, I can't work out how this happened. A review of the code - in 
> as much as I can understand it, not being a C programmer - shows that 
> there's only one place the MAXMSG_CLIENT parameter is used, and that's 
> in xymond. In particular, it's not used in the xymon client (which is 
> the only process that logs to xymonclient.log).

I also digged through the source code trying to find answers and since 
I'm using local mode on my clients (thus utilising the xymond_client 
binary), I think it makes sense (more or less).

> 
> I can understand how it could have come about that xymond was loaded 
> using xymonclient.cfg for its environment, thus applying the smaller 
> size limit to incoming messages. But if this were the case, I can't work 
> out how you would have seen MAXMSG_CLIENT=2048 in the running xymond 
> process's environment.

My MAXMSG_CLIENT=2048 messages were always server-side (thanks to your 
env command line showing me the current used options), I never even saw 
that variable on my client, because it never got set. Only after I 
manually added it to xymonclient.cfg, it started working as expected.

I think it classifies as a bug, but xymon's localmode is somewhat 
undocumented (the binary for it is missing in the Debian package as 
well, for example...) and in my opinion this should be documented somewhere.

Christoph

> 
> So, I'm glad you worked out a solution. But I don't think we quite 
> understand the cause.
> 
> On Thu, 10 Mar 2022 at 22:41, Jeremy Laidman <jeremy at laidman.org 
> <mailto:jeremy at laidman.org>> wrote:
> 
>     Great work Christoph.
> 
>     Sorry, it appears that I led you down the wrong path, asserting that
>     it was a server-only setting in xymond. It would appear to be a
>     client-side setting. This seems to be undocumented in the man page
>     for xymonclient.cfg.
> 
>     J
> 
>     On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <zechner at vrvis.at
>     <mailto:zechner at vrvis.at>> wrote:
> 
>         I solved it!
> 
>         I had to add and set "MAXMSG_CLIENT=1024" in
>         /etc/xymon/xymonclient.cfg,
>         restarted xymon-client and all the errors were gone.
> 
>         Thanks again for your help!
> 
>         Cheers
>         Christoph
> 
> 
>         On 09/03/2022 06:42, Christoph Zechner wrote:
>          > On 09/03/2022 00:04, Jeremy Laidman wrote:
>          >> On Tue, 8 Mar 2022 at 18:52, Christoph Zechner
>         <zechner at vrvis.at <mailto:zechner at vrvis.at>
>          >> <mailto:zechner at vrvis.at <mailto:zechner at vrvis.at>>> wrote:
>          >>
>          >>     It seems I celebrated prematurely, the errors are back
>         in exactly the
>          >>     same way :-/
>          >>
>          >>     2022-03-08 08:47:19.321457 Got over-size message,
>         truncating at
>          >> 528383
>          >>     bytes (max: 524288)
>          >>     2022-03-08 08:47:19.339786 Dropping (more) garbled data
>          >>
>          >>     I don't understand where this limit 05 512 comes from,
>         everything on
>          >>     the
>          >>     server checks out (2048 before, tried 4096 as well, no
>         change).
>          >>
>          >>
>          >> I'm at a loss. If the xymond process is proven to have this
>         set at
>          >> 2048, then I see no reason why it would give that error
>         message with
>          >> that number.
>          >>
>          >> Unless it's referring to another message type and hence a
>         different
>          >> maximum setting? Perhaps take a look at xymond's environment
>         again,
>          >> but search for all MAXMSG_ variables. See which one is set
>         to 512, and
>          >> that might be the culprit. The defaults for these max values
>         are all
>          >> different, with only two of them defaulting to 512:
>         MAXMSG_CLIENT,
>          >> MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's
>         possible one
>          >> of them has been set to 512.
>          >
>          > Thanks, I tried that, but unfortunately, this did not help,
>         since all
>          > the values were set correctly, according to my config.
>          >
>          >>
>          >> The only other thing I can think of is that you have two
>         copies of
>          >> xymond running, somehow with different values of
>         MAXMSG_CLIENT. But I
>          >> can't think how this could come about. And you've already
>         killed off
>          >> any rogue processes.
>          >
>          > Right, that's not it either. :-/
>          >
>          >>
>          >> Maybe run xymond in debug mode for one round of updates,
>         until you get
>          >> the "Got over-size message" and review the debug logs. This
>         might
>          >> provide enough additional detail to find out what's going on.
>          >>
>          >> Another approach to solve the problem (truncated client data
>         message)
>          >> is to modify the client script (eg xymonclient-linux.sh) to
>         truncate
>          >> the ps command output, so that the total message size is
>         less, and
>          >> hopefully fits within the max message size. This will mean
>         that PROC
>          >> checks might not work anymore (which is likely the case
>         now). But the
>          >> current state is that monitoring of the sections that come
>         after [ps]
>          >> are likely broken now. On Linux this is notably the [top]
>         and [vmstat]
>          >> sections of the client data message, that are used for the
>         "cpu"
>          >> status and several metrics for graphing. Maybe something
>         like adding
>          >> "head -1000" will cut it down to a reasonable size:
>          >>
>          >> echo "[ps]"
>          >> ps -Aww -o
>         pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd |
>          >> head -1000
>          >
>          > That's actually a gread idea and I modified the [ports]
>         section, because
>          > I know this is the culprit (running a proxy there and all the
>         active
>          > client connections were too much for xymon to handle.
>          >
>          > I'm not interested in client connections anyway, I just want
>         to monitor
>          > my running programs and ports on that server, so I replaced
>         the original
>          >
>          > netstat -antuW 2>/dev/null
>          > netstat -antuT 2>/dev/null
>          >
>          > with
>          >
>          > netstat -tulpenW 2>/dev/null
>          >
>          > (adding your "| head 1000" suggestion did not work, because
>         it cut off
>          > the list before it could reach the IPv6 interfaces and thus
>         the ports
>          > check was always red).
>          >
>          > Now xymon works again, although this is just a workaround,
>         because the
>          > underlying problem of where exactly my messages got
>         truncated, is still
>          > to be found, but I can live with this solution.
>          >
>          > Anyway, I very much appreciate your time and efforts, thank
>         you very much!
>          >
>          > Cheers
>          > Christoph
>          >
>          >>
>          >> Also, review the client data message before the [ps] section
>         to see if
>          >> there's actually something else pushing it over the limit,
>         and [ps]
>          >> just happens to be where the truncation happens.
>          >>
>          >> J
>          >>
>          > _______________________________________________
>          > Xymon mailing list
>          > Xymon at xymon.com <mailto:Xymon at xymon.com>
>          > http://lists.xymon.com/mailman/listinfo/xymon
>         <http://lists.xymon.com/mailman/listinfo/xymon>
> 


More information about the Xymon mailing list