[Xymon] New server causing issues with CONN test

Henrik Størner henrik at hswn.dk
Mon Aug 15 23:16:52 CEST 2011


On 15-08-2011 22:46, Poppy, Ben wrote:
> I'm having a pretty strange issue. We have our existing hobbit servers
> running on Fedora servers running hobbit 4.2.0. I'm working on
> installing brand new servers that will be running CentOS 6 64-bit and
> the latest version of xymon (4.3.3 before I saw 4.3.4 today).

[installs and starts 4.3 version]

> Within a few minutes, 4 servers turn to red alerts on CONN on the
> existing Fedora based Hobbit servers. They begin flapping on and off of
> red alert until I shutdown the new CentOS xymon server. Within a few
> minutes of the new server being shut down, the alerts go away for good.
>
> I have tried going to Centos 5 32-bit, 64-bit, even trying xymon 4.2.3,
> or all the way back to hobbit 4.2.0 all with the same result, and the
> exact same 4 servers each time.

As I understand, you were running both versions simultaneously. Did 
those servers also go red on the new Xymon version, or only on the old 
one? If they were red also on the new server, did you try stopping 
network tests on the old server and did that make a difference ?

Which ping-tool are you using - xymonping or fping ?

I haven't heard of anything like this before, but I suspect it may be an 
issue with the way "ping" works. When routing traffic, most systems will 
pass ping-traffic with a low priority, so it is quite easy for 
ping-requests and -responses to be dropped. Since xymonping and fping 
pump out a lot of ping-traffic rather quickly, maybe the new server just 
happened to be more "lucky" with its data than the old one - perhaps due 
to the switch port it is on, or the speed of the network interface and 
so on.

It might be worthwhile to make sure that the old and the new system does 
not run the network tests at the same time - keep an eye (with "ps" on 
when the network test runs on the old system, and don't start Xymon on 
the new system until about 30 secs after the old system completes the 
network tests. (Assuming your network tests don't take more than a 
couple of minutes, so there is time for both systems to run their tests 
within the default 5 minute interval).


Regards,
Henrik



More information about the Xymon mailing list