[Xymon] Installing Xymon from terabithia; two weird issues

Japheth Cleaver cleaver at terabithia.org
Tue Mar 21 16:36:19 CET 2017


>     On Fri, Mar 17, 2017 at 8:56 AM, Peter Welter
>     <peter.welter at gmail.com <mailto:peter.welter at gmail.com>> wrote:
>
>         Hi JC,
>
>         I'm still experiencing some difficulties with Xymon version
>         (4.3.27-1.el6.terabithia) software, that is being deployed
>         from http://terabithia.org/rpms/xymon/el6/i686/
>         <http://terabithia.org/rpms/xymon/el6/i686/>.
>
>         There are two different types of problems:
>
>         1) Has to do with the integration of Xymon/Devmon.
>
>            Although Devmon gets valid SNMP-data, for each poll, the
>         values in the if_load.Ethernet3_1.rrd-file (for example) are
>         showing gaps. The next value is so much larger than the rest,
>         so the total graph is going beserk because of the spikes that
>         are being shown.
>
>            ...[snip]
>                     <!-- 2017-03-15 15:10:00 CET / 1489587000 -->
>         <row><v>5.7197560484e+01</v><v>5.7540255376e+01</v></row>
>                     <!-- 2017-03-15 15:15:00 CET / 1489587300 -->
>         <row><v>5.8052253788e+01</v><v>5.7062462121e+01</v></row>
>                     <!-- 2017-03-15 15:20:00 CET / 1489587600 -->
>         <row><v>5.8039204545e+01</v><v>5.7738579545e+01</v></row>
>                     <!-- 2017-03-15 15:25:00 CET / 1489587900 -->
>         <row><v>5.8352395833e+01</v><v>5.7912187500e+01</v></row>
>                     <!-- 2017-03-15 15:30:00 CET / 1489588200 -->
>         <row><v>5.7961458333e+01</v><v>5.8807500000e+01</v></row>
>                     <!-- 2017-03-15 15:35:00 CET / 1489588500 -->
>         <row><v>5.7040675403e+01</v><v>5.7108262769e+01</v></row>
>                     <!-- 2017-03-15 15:40:00 CET / 1489588800 -->
>         <row><v>5.7984999119e+01</v><v>5.8214662436e+01</v></row>
>                     <!-- 2017-03-15 15:45:00 CET / 1489589100 -->
>         <row><v>1.6832224569e+16</v><v>1.6832224569e+16</v></row>
>                     <!-- 2017-03-15 15:50:00 CET / 1489589400 -->
>         <row><v>4.4656922344e+16</v><v>4.4656922343e+16</v></row>
>                     <!-- 2017-03-15 15:55:00 CET / 1489589700 -->
>         <row><v>5.7648150173e+01</v><v>5.7687031165e+01</v></row>
>                     <!-- 2017-03-15 16:00:00 CET / 1489590000 -->
>         <row><v>5.9068884188e+01</v><v>5.9453689406e+01</v></row>
>                     <!-- 2017-03-15 16:05:00 CET / 1489590300 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:10:00 CET / 1489590600 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:15:00 CET / 1489590900 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:20:00 CET / 1489591200 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:25:00 CET / 1489591500 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:30:00 CET / 1489591800 -->
>         <row><v>1.9398478192e+07</v><v>1.8707899982e+07</v></row>
>                     <!-- 2017-03-15 16:35:00 CET / 1489592100 -->
>         <row><v>5.6938284153e+01</v><v>5.6770437158e+01</v></row>
>                     <!-- 2017-03-15 16:40:00 CET / 1489592400 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:45:00 CET / 1489592700 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:50:00 CET / 1489593000 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 16:55:00 CET / 1489593300 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 17:00:00 CET / 1489593600 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 17:05:00 CET / 1489593900 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 17:10:00 CET / 1489594200 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 17:15:00 CET / 1489594500 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 17:20:00 CET / 1489594800 -->
>         <row><v>NaN</v><v>NaN</v></row>
>                     <!-- 2017-03-15 17:25:00 CET / 1489595100 -->
>         <row><v>3.5775056887e+07</v><v>3.4501518955e+07</v></row>
>                     <!-- 2017-03-15 17:30:00 CET / 1489595400 -->
>         <row><v>5.7219344262e+01</v><v>5.7417704918e+01</v></row>
>                     <!-- 2017-03-15 17:35:00 CET / 1489595700 -->
>         <row><v>5.7166338798e+01</v><v>5.9383825137e+01</v></row>
>                     <!-- 2017-03-15 17:40:00 CET / 1489596000 -->
>         <row><v>5.6769617486e+01</v><v>5.6981202186e+01</v></row>
>                     <!-- 2017-03-15 17:45:00 CET / 1489596300 -->
>         <row><v>5.7549617486e+01</v><v>5.7382732240e+01</v></row>
>             ...[snip]
>             This behaviour does NOT occur on my current Xymon server
>         (version 4.2.3) running on SLES11 SP4.
>
>             First I thought that this has to do with vmware, but that
>         is not the case. VM or bare metal; the behaviour is the same.
>
>             I made sure to see that even the devmon module is not
>         causing the problems. The same devmon software works fine on
>         SLES and RHEL. The snmpwalk-command does get valid SNMP-data,
>         when writing to a files. It just seems that Xymon does not
>         update the rrd-file correctly!?!?
>
>             Any suggestions how to proceed?
>


Assuming that the numeric values are correct for the time periods that 
are coming in, my first thought would be that there's something unusual 
going on with RRD cacheing. Are you seeing this issue with other trends 
graphs, either for other tests on this host, other hosts using this 
test/data, or any other graphs period?

If it's unique to this, then that speaks to a problem with this specific 
data transmission. If not, there could be a larger issue with xymond_rrd 
(I/O performance, for example). I'd start with enabling debug output and 
examining the logs for when it's receiving data for this test. (Not sure 
if this is being sent via 'data' or 'status' messages, but you'll want 
to make sure you're enabling debug for the right copy of xymond_rrd.)

If nothing there, then you might try disabling the cache, which will 
force xymond_rrd to write things out as received (but will also increase 
I/O load a lot).

If neither of those fix it, there could actually be an issue with the 
data coming in. At about that point I would set up a channel listener 
looking specifically for the host.svc messages related to this source so 
I could physically see the contents of each one coming in and look for 
any anomalies.

HTH,
-jc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20170321/c4627244/attachment.html>


More information about the Xymon mailing list