[Xymon] Is it possible to recreate rrd from hostdata?

Mon Jul 6 02:13:33 CEST 2020

On Fri, 12 Jun 2020 at 01:07, Hansen, Rene H <rhansen21 at dxc.com> wrote:

> Hi
>
>
>
> In one of my Xymon systems the rrd status and rrd data processes was
> consuming 100% cpu and was crashing/restarting in a loop due to
> misconfiguration.
>

Argh. I hate that.

> This resulted in a gap in my rrd files and missing data when running
> perfdata.cgi report.
>

My OCD-ish tendencies dislike gaps in the graphs. On the other hand, the
graphs tell a story, just not about the target host.

>  Is there an easy way to run rrd manually on hostdata to fill out the gaps
> in the rrd files?
>

Short answer: no.

Long answer: not an easy way, and not a reliable way.

The hostdata has the necessary information required to replicate only some
RRD data points, and only for some tests. Tests like CPU and DISK are
populated from data in the client data messages sent from the hosts, so
there's a possibility of some useful data for those (you can tell which
tests are created from client data messages because there's a "Client data
available" link at the bottom of the page).

However, hostdata is only saved for a host when a test changes to an alert
state. (This is described in the man page for xymond_hostdata.)

So, unless you had a new fault every 5 minutes, the hostdata snapshot files
will be missing data.

Let's assume you can find enough hostdata entries to be useful. You'd then
need to get the data into the RRD file. This means either a) parsing the
data yourself (eg in a python script) and sending the data to the RRD file
using rrdtools, or b) sending the data to Xymon to parse. Problems with
these approaches:

a) using rrdtools: rrdupdate warns/ignores data points that are earlier
than the most recent update, so you would need to do a dump/edit/restore
cycle, or perhaps truncate the data in the RRD file up to the point you
wanted to start, then add the parsed hostdata and then re-add the data
points that were truncated. Parsing wouldn't be trivial, because Xymon has
hundreds (probably thousands) of lines of code that know how to parse the
data in the right way, and you'd have to re-implement this code yourself.

b) using xymon: you can send a client data message to xymond, and it will
parse it using all using the xymond_client worker module (you may be able
to run this directly), but there's no guarantee that Xymon won't discard
the data because it's too old, and even if it didn't, you'd still have to
work around the rrd restriction as per a) above where data points are
discarded if older than the most recent.

I have two Xymon servers. It's rare that one goes down, but if it does, I
can dump/splice/restore and re-create the missing RRD data on the server
that went down using the RRD data on the server that didn't. DISCLAIMER:
Although I've used this procedure to remove a massive spike caused by a
processing artefact (thus rendering the graph useless due to the scale),
I've not done this to fill in gaps.

Cheers
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20200706/7aee1dc1/attachment.htm>