[Xymon] Gaps in graphs

Jeremy Laidman jeremy at laidman.org
Tue Mar 9 23:53:43 CET 2021


On Tue, 9 Mar 2021 at 18:47, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

> >You mentioned an "old setup". Can you describe what has changed from old
> to new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?
>
>
>
> I changed OS, CentOS 5.11 -> RH 7.9 and Xymon from 4.3.7 to 4.3.30, and
> changed from selfcompiled to Terabithia-packages. So quite a big jump.
>
> Yes, client and server both runs on the same host. As did it on the old
> system. I want he Xymon server itself monitored.
>

Yep, that makes sense. My curiosity around this is the possibility that the
Xymon server is running the client scripts from its clientlaunch process,
and also a second copy of clientlaunch is running the same scripts - in
essence, a "server" instance of the client scripts, as well as a "client"
instance of the client scripts. If this is happening, you'll get two data
messages every 5 minutes instead of one.

Again, I don't think this would cause graph gaps, but it might be causing
some of your warning logs.

Interestingly, Terabithia packages for Xymon up to v4.3.18 included both
client and server components in the one "xymon" package, as well as in the
"xymon-client" package. You would only install "xymon" or "xymon-client"
but not both (or you might get duplicate clients running). However, from
v4.3.18, the client files in the xymon package were removed, requiring both
"xymon" and "xymon-client" to be installed on the Xymon server (if you
wanted to the server to monitor itself). You appear to have both packages
installed on your Xymon server.

I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes
> to the secondary. Only the secondary is updated as of yet.
>

It makes sense to have two for redundancy. Have you thought about
configuring both Xymon servers in each client? That way, if the primary
goes down, the secondary will still receive updates. (This has nothing to
do with diagnosing the gaps in your graphs, I'm just curious.)

>You said that you noticed on the Xymon server itself. Has it only happened
> to graphs for the Xymon server? I'm wondering if you have the Xymon
> client AND the Xymon server both running on the same host?
>
>
>
> After I noticed it on the Xymon server itself, I went looked for gaps
> elsewhere, and I found some on other servers as well.
>

Right, so the gaps aren't likely to be caused by client and server running
together, if it's also happening for other servers not running the Xymon
server. But this might be the cause of your RRD warnings.

Also in xymonclient.log I get these quite alot, dunno if its related:
>
>
>
> mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory
>
> cat: /dev/shm/xymon_vmstat.x: No such file or directory
>
> cat: /dev/shm/xymon_vmstat.x: No such file or directory
>
>
Do you only see these on the Xymon server, or these log messages also
showing on Xymon clients? And if so, at what frequency?

>
>
> >Can you explain "quite alot"? Can you give an indication of how often
> these occur?
>
>
>
> 623 lines in the logfile yesterday.
>

That's roughly 2 every 5-minute interval. That's significant.

Your symptoms (xymonclient.log messages, RRD warnings) are consistent with
two different instances of the Xymon client script running at the same
time. When this happens, each instance tries to create and populate
xymon_vmstat.<servername> (from a vmstat command) and include its contents
in the client status message before removing the file. Usually the file
only exists for a brief moment. If two instances of the client are running,
it's unlikely that both would create the file, and then try to use it, at
the same time. But if it did happen, the one instance would likely show the
"No such file or directory" message, because the other instance had removed
the file. A classic "race condition".

Similarly, the Xymon client script creates the
logfetch.<servername>.cfg.tmp file, then renames it to
logfetch.<servername>.cfg. If a second instance tries to rename the file
after the first instance has already done so, then you'll see the "No such
file or directory".

Can you show me the output of the following commands. I'm running this on
one of my Xymon servers (using Terabithia RPMs) to show what you might
expect:

$ pgrep -lf xymonlaunch
16602 /usr/lib/xymon/server/bin/xymonlaunch
--config=/usr/lib/xymon/server/etc/tasks.cfg
--env=/usr/lib/xymon/server/etc/xymonserver.cfg
--log=/var/log/xymon/xymonlaunch.log
--pidfile=/var/log/xymon/xymonlaunch.pid

$ pgrep -lf vmstat
8304 sh -c vmstat 300 2
1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv
/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252
/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>
8306 vmstat 300 2

Cheers
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20210310/f13ec61b/attachment.htm>


More information about the Xymon mailing list