<div dir="ltr"><div><div><div><div>Hi all,<br><br></div>This weekend, something happened with all our graphs. Every hosts' graphs are either corrupted or distorted and the history is unusable. I have checked all the usual places for graphs logging, rrd-data.log and rrd-status.log and other system log files but I am stumped as to where to start fixing this.  We are looking at restoring rrds from previous snapshot which may or may not work but still would like to solve this mystery.<br><br></div>I have attached 2 screens but I do not know if these are viewable on the mailing list.  It is hard to explain without but essentially there are huge numbers in our graphs such 3945789385793485793847593847593847593847593847593845793485739 and lots of '?' and there is no usable history, just a straight line along the base with one peak (or two) around the time this all happened (with a day or two out either way). If you try to zoom in, you get to a screen that just says 'zoom source image' and it's a black screen but if you hover your mouse over the screen you can find an area that is selectable and this shows a close up of the zoom area<br><br>rrdtool info example (for the same screenshot host test):<br><br>filename = "disk,C.rrd"<br>rrd_version = "0003"<br>step = 300<br>last_update = 1436270189<br>ds[pct].type = "GAUGE"<br>ds[pct].minimal_heartbeat = 600<br>ds[pct].min = 0.0000000000e+00<br>ds[pct].max = 1.0000000000e+02<br>ds[pct].last_ds = "89"<br>ds[pct].value = 7.9210000000e+03<br>ds[pct].unknown_sec = 0<br>ds[used].type = "GAUGE"<br>ds[used].minimal_heartbeat = 600<br>ds[used].min = 0.0000000000e+00<br>ds[used].max = NaN<br>ds[used].last_ds = "28436524"<br>ds[used].value = 2.5308506360e+09<br>ds[used].unknown_sec = 0<br>rra[0].cf = "AVERAGE"<br>rra[0].rows = 576<br>rra[0].pdp_per_row = 1<br>rra[0].xff = 5.0000000000e-01<br>rra[0].cdp_prep[0].value = NaN<br>rra[0].cdp_prep[0].unknown_datapoints = 0<br>rra[0].cdp_prep[1].value = NaN<br>rra[0].cdp_prep[1].unknown_datapoints = 0<br>rra[1].cf = "AVERAGE"<br>rra[1].rows = 576<br>rra[1].pdp_per_row = 6<br>rra[1].xff = 5.0000000000e-01<br>rra[1].cdp_prep[0].value = 4.4500000000e+02<br>rra[1].cdp_prep[0].unknown_datapoints = 0<br>rra[1].cdp_prep[1].value = 1.4218146600e+08<br>rra[1].cdp_prep[1].unknown_datapoints = 0<br>rra[2].cf = "AVERAGE"<br>rra[2].rows = 576<br>rra[2].pdp_per_row = 24<br>rra[2].xff = 5.0000000000e-01<br>rra[2].cdp_prep[0].value = 2.0470000000e+03<br>rra[2].cdp_prep[0].unknown_datapoints = 0<br>rra[2].cdp_prep[1].value = 6.5402986560e+08<br>rra[2].cdp_prep[1].unknown_datapoints = 0<br>rra[3].cf = "AVERAGE"<br>rra[3].rows = 576<br>rra[3].pdp_per_row = 288<br>rra[3].xff = 5.0000000000e-01<br>rra[3].cdp_prep[0].value = 1.2727000000e+04<br>rra[3].cdp_prep[0].unknown_datapoints = 0<br>rra[3].cdp_prep[1].value = 4.0657944878e+09<br>rra[3].cdp_prep[1].unknown_datapoints = 0<br><br></div>This weekend we had a network intervention in that we moved some network connections in one of the 2 data centers but there was no downtime as we switched the network connectivity to the other data room. Our Xymon server is running on a virtual server (RHEL5) and the version we are using is 4.3.19. <br><br></div>All graphs were fine until this point.  Any ideas?<br><div><div><br></div></div></div>