[Xymon] All Xymon rrd graphs suddenly haywire

Steve B rectifier at gmail.com
Tue Jul 7 14:13:18 CEST 2015


Hi all,

This weekend, something happened with all our graphs. Every hosts' graphs
are either corrupted or distorted and the history is unusable. I have
checked all the usual places for graphs logging, rrd-data.log and
rrd-status.log and other system log files but I am stumped as to where to
start fixing this.  We are looking at restoring rrds from previous snapshot
which may or may not work but still would like to solve this mystery.

I have attached 2 screens but I do not know if these are viewable on the
mailing list.  It is hard to explain without but essentially there are huge
numbers in our graphs such
3945789385793485793847593847593847593847593847593845793485739 and lots of
'?' and there is no usable history, just a straight line along the base
with one peak (or two) around the time this all happened (with a day or two
out either way). If you try to zoom in, you get to a screen that just says
'zoom source image' and it's a black screen but if you hover your mouse
over the screen you can find an area that is selectable and this shows a
close up of the zoom area

rrdtool info example (for the same screenshot host test):

filename = "disk,C.rrd"
rrd_version = "0003"
step = 300
last_update = 1436270189
ds[pct].type = "GAUGE"
ds[pct].minimal_heartbeat = 600
ds[pct].min = 0.0000000000e+00
ds[pct].max = 1.0000000000e+02
ds[pct].last_ds = "89"
ds[pct].value = 7.9210000000e+03
ds[pct].unknown_sec = 0
ds[used].type = "GAUGE"
ds[used].minimal_heartbeat = 600
ds[used].min = 0.0000000000e+00
ds[used].max = NaN
ds[used].last_ds = "28436524"
ds[used].value = 2.5308506360e+09
ds[used].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 576
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[0].cdp_prep[1].value = NaN
rra[0].cdp_prep[1].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 576
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 4.4500000000e+02
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[1].cdp_prep[1].value = 1.4218146600e+08
rra[1].cdp_prep[1].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 576
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 2.0470000000e+03
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[2].cdp_prep[1].value = 6.5402986560e+08
rra[2].cdp_prep[1].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 576
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 1.2727000000e+04
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[3].cdp_prep[1].value = 4.0657944878e+09
rra[3].cdp_prep[1].unknown_datapoints = 0

This weekend we had a network intervention in that we moved some network
connections in one of the 2 data centers but there was no downtime as we
switched the network connectivity to the other data room. Our Xymon server
is running on a virtual server (RHEL5) and the version we are using is
4.3.19.

All graphs were fine until this point.  Any ideas?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150707/8f6218b2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xymondisk.png
Type: image/png
Size: 51133 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150707/8f6218b2/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xymondisk2.png
Type: image/png
Size: 59686 bytes
Desc: not available
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150707/8f6218b2/attachment-0001.png>


More information about the Xymon mailing list