[Xymon] Bug in xymonping reporting wrong data when pinging multiple hosts

Michael Beatty Michael.Beatty at sherwin.com
Wed Jan 9 13:36:31 CET 2013


Installed FPing also, working fine.  For whatever reason it installed 
with rwxr-xr-x permissions, needed to chmod u+s for it to work.

In some more testing, the xymonping does work to report hosts that are 
failed.  So it is still effective for alerting purposes, but from a 
reporting standpoint, not so much.

Michael Beatty

On 01/08/2013 06:57 PM, Jeremy Laidman wrote:
> Of course the solution is to use fping.  Henrik has previously stated 
> that fping is preferred over xymonping 
> <http://lists.xymon.com/archive/2012-January/033738.html>, and the 
> fping.sh script used when building will warn that "it is not yet fully 
> stable".
>
> I've just now installed fping (and configured Xymon to use it) and my 
> graphs are now showing much more reasonable values than before.
>
> Nevertheless, it's not obvious in any documentation that xymonping 
> will give bad data.  The caveats on its use suggest (to me) that it 
> can miss some replies when large numbers of hosts are probed, but in 
> practice it gives bad data even when the number of hosts is two.
>
> J
>
>
>
> On 9 January 2013 10:39, Jeremy Laidman <jlaidman at rebel-it.com.au 
> <mailto:jlaidman at rebel-it.com.au>> wrote:
>
>     Yup, I get this too, tested with v4.3.10 and v4.3.4.  It also
>     shows up when I ping the localhost address repeatedly:
>
>     sudo ./xymon-4.3.4/xymonnet/xymonping 127.0.0.1 127.0.0.1
>     127.0.0.1 127.0.0.1 127.0.0.1
>     127.0.0.1 is alive (20 ms)
>     127.0.0.1 is alive (0.02 ms)
>     127.0.0.1 is alive (24 ms)
>     127.0.0.1 is alive (0.02 ms)
>     127.0.0.1 is alive (0.02 ms)
>
>     The 20ms and 24ms entries are wrong, and they change as I adjust
>     the max-pps values, by a factor of 5.
>
>     None of my conn graphs seems to be completely flatlined, but I
>     have noticed that DNS test times are usually less than conn test
>     times, which is a bit odd, but might be unrelated.  Hmm, now that
>     I look at them, it seems all of my graphs but one are hovering
>     close to either 24ms or 48ms.  The host that is the exception,
>     with a conn graph that looks correct, happens to be the last entry
>     if I sort all host IP addresses.
>
>     J
>
>
>
>     On 9 January 2013 05:49, Michael Beatty
>     <Michael.Beatty at sherwin.com <mailto:Michael.Beatty at sherwin.com>>
>     wrote:
>
>         Using Xymon 4.3.7
>         OS Linux SuSE
>
>         I've been struggling to understand why certain hosts are
>         almost always reporting the exact same ping response time. 
>         I've determined, that xymonping isn't working, it is reporting
>         incorrect data for half of the hosts tested.
>
>         I start by pinging 6 hosts, one at a time, everything is correct
>         /[xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.22
>         X.X.X.22 is alive (0.06 ms)
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.70
>         X.X.X.70 is alive (0.56 ms)
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.138
>         X.X.X.138 is alive (826 ms)
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
>         X.X.X.137 is alive (980 ms)
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.201
>         X.X.X.201 is alive (0.75 ms)
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.202
>         X.X.X.202 is alive (0.66 ms)
>         /
>
>         Then, put them in the same command, the first, second, and
>         fifth values are wrong
>         /[xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.70
>         X.X.X.22 X.X.X.138 X.X.X.137 X.X.X.201 X.X.X.202
>         X.X.X.70 is alive (40 ms)
>         X.X.X.22 is alive (20 ms)
>         X.X.X.138 is alive (1307 ms)
>         X.X.X.137 is alive (1738 ms)
>         X.X.X.201 is alive (20 ms)
>         X.X.X.202 is alive (0.64 ms)/
>
>
>         Switch the order of the pings, the first, second, and fifth
>         value are exactly the same as the first time, and still wrong
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.201
>         X.X.X.202 X.X.X.137 X.X.X.138 X.X.X.70 X.X.X.22
>         X.X.X.201 is alive (40 ms)
>         X.X.X.202 is alive (20 ms)
>         X.X.X.137 is alive (1598 ms)
>         X.X.X.138 is alive (2069 ms)
>         X.X.X.70 is alive (20 ms)
>         X.X.X.22 is alive (0.04 ms)
>         [xymon at mxbscs tmp]$
>
>         Switch the order again, now the third, fourth, and fifth
>         values are wrong.
>         /[xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping
>         X.X.X.137 X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22
>         X.X.X.137 is alive (1537 ms)
>         X.X.X.138 is alive (2016 ms)
>         X.X.X.201 is alive (40 ms)
>         X.X.X.202 is alive (20 ms)
>         X.X.X.70 is alive (20 ms)
>         X.X.X.22 is alive (0.06 ms)/
>
>
>         Another thing I have noticed is that by altering the max-pps
>         value, you get completely different results.
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
>         X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22 --max-pps=1
>         X.X.X.137 is alive (2000 ms)
>         X.X.X.138 is alive (1000 ms)
>         X.X.X.201 is alive (2000 ms)
>         X.X.X.202 is alive (1000 ms)
>         X.X.X.70 is alive (1000 ms)
>         X.X.X.22 is alive (0.06 ms)
>
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
>         X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22 --max-pps=5
>         X.X.X.137 is alive (1500 ms)
>         X.X.X.138 is alive (1479 ms)
>         X.X.X.201 is alive (400 ms)
>         X.X.X.202 is alive (200 ms)
>         X.X.X.70 is alive (200 ms)
>         X.X.X.22 is alive (0.06 ms)
>
>         [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
>         X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22 --max-pps=25
>         X.X.X.137 is alive (765 ms)
>         X.X.X.138 is alive (896 ms)
>         X.X.X.201 is alive (80 ms)
>         X.X.X.202 is alive (40 ms)
>         X.X.X.70 is alive (40 ms)
>         X.X.X.22 is alive (0.04 ms)
>
>
>         It doesn't appear to be a problem with my configuration. I
>         checked the www.xymon.com <http://www.xymon.com> demo site,
>         and there seems to be the same issue there. The signature of
>         the bad data is easy to see in the graphs as good data has and
>         diverse line, where as bad data is a generally flat line.
>         These hosts look good:
>         http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=pto.linuxbog.dk&SERVICE=conn
>         http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=dali.hswn.dk&SERVICE=conn
>
>         These hosts look bad:
>         http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=blixen.hswn.dk&SERVICE=conn
>         http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=wifi.hswn.dk&SERVICE=conn
>
>
>
>         -- 
>         Michael Beatty
>
>
>         _______________________________________________
>         Xymon mailing list
>         Xymon at xymon.com <mailto:Xymon at xymon.com>
>         http://lists.xymon.com/mailman/listinfo/xymon
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20130109/28bce63f/attachment.html>


More information about the Xymon mailing list