[Xymon] Xymon disruption every night!
L-M-J
linuxmasterjedi at free.fr
Tue Feb 16 10:44:08 CET 2016
Hi,
I'm still running into troubles every night between ~0h30 and ~2h40 :-(
1) I checked the backup on my physical XYmon server : around 9pm and runs for 4:45 min.
2) We cross-monitored the DNS server from another monitoring tool : no DNS outage detected.
3) I monitored the Xymon server network link state with "mii-tool" every seconds : no troubles detected
4) I pinged my Xymon servers from 2 differents network places all night long : no troubles detected.
5) No firewalls between my Xymon server and the monitored hosts
6) Over 500 hosts, only ~30 are in trouble every night and mostly the same
7) Hosts are VM, physical servers, public internet website
Here is what I've found in the xymond.log today :
2016-02-16 02:02:57 Flapping detected for www.foo1.com:http - 5 changes in 1708 seconds
2016-02-16 02:02:57 Flapping detected for www.foo2.com:http - 5 changes in 1708 seconds
2016-02-16 02:02:57 Flapping detected for www.microsoft.com:http - 5 changes in 1708 seconds
2016-02-16 02:06:14 Flapping detected for server01:http - 5 changes in 1678 seconds
2016-02-16 02:06:14 Flapping detected for server02:http - 5 changes in 1678 seconds
2016-02-16 02:06:29 Flapping detected for server03:conn - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server04:ldap - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server06:ssh - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server05:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server07:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server08:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server09:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for foo.bar1.com:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for foo.bar2.com:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for foo.bar3.fr:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server10:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server11-t:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server12:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server13:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server14:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server15:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server16:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server17:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server18:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server19:http - 5 changes in 1745 seconds
Here is a part of the configuration + errors displayed in the XYmon HTTP interface :
hosts.cfg : 0.0.0.0 server03 # conn NAME:"server03" DESCR:"VM FOO BAR"
Error : conn NOT ok : DNS lookup failed / Unable to resolve hostname server03
System unreachable for 2 poll periods (86 seconds)
Everything looks like the DNS resolution failed.
hosts.cfg : 10.X.Y.188 server05 # conn tse NAME:"Server 05" DESCR:"My comment" http://server05/
Error : DNS error red http://server05/ - DNS error
- Why I have a "DNS error" here ? I set up the IP yesterday to this host to solve the issue. The "conn" error disappear since yesterday evening but the http still remains.
Le 29 janvier 2016 13:22:06 GMT+01:00, Becker Christian <christian.becker at rhein-zeitung.net> a écrit :
>My intention was the figure out if the network connection of the Xymon
>server itself has a problem…
>For example, if your Xymon server is hardware, then it has a wired
>network interface that is connected to a network switch. That’s your
>link between the Xymon server and all of your other VMs and physical
>servers.
>From my side, if you only see problems on the Xymon server, I’ld have a
>look at this particular switch port or the cable infrastructure to the
>Xymon server. Or could there be a firewall rule preventing the Xymon
>server accessing the DNS server?
>
>By the way – do you have only one DNS server in /etc/resolv.conf? Did
>you check the logs on your DNS server? Can you issue a continuous ping
>to the Xymon server to see if it loses some packages in 24hours?
>
>Regards
>Christian
>
>
>Christian Becker
>IT-Services
>
>Christian.Becker at rhein-zeitung.net<mailto:Christian.Becker at rhein-zeitung.net>
>_________________________________
>Mittelrhein-Verlag GmbH
>August-Horch-Straße 28
>D-56070 Koblenz
>Verleger und Geschäftsführer: Walterpeter Twer
>Reg.-Gericht Koblenz HRB 121
>Finanzamt Koblenz Str.Nr. 22 65 10 285 2
>www.rhein-zeitung.de<http://www.rhein-zeitung.de/>
>
>Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
>Gesendet: Freitag, 29. Januar 2016 13:07
>An: Xymon at xymon.com
>Betreff: Re: [Xymon] Xymon disruption every night!
>
>Problems appears on VMs and physical servers and Lan and DMZ
>equipments. I don't see a link between those devices :-(
>
>Le 29 janvier 2016 09:23:14 GMT+01:00, Becker Christian
><christian.becker at rhein-zeitung.net<mailto:christian.becker at rhein-zeitung.net>>
>a écrit :
>Hi L-M-J,
>
>
>can you exclude that this behavior is coming from any network device
>like a switch or default gateway?
>
>
>Regards
>Christian
>
>
>Christian Becker
>IT-Services
>
>
>Christian.Becker at rhein-zeitung.net<mailto:Christian.Becker at rhein-zeitung.net>
>_________________________________
>Mittelrhein-Verlag GmbH
>August-Horch-Straße 28
>D-56070 Koblenz
>Verleger und Geschäftsführer: Walterpeter Twer
>Reg.-Gericht Koblenz HRB 121
>Finanzamt Koblenz Str.Nr. 22 65 10 285 2
>www.rhein-zeitung.de<http://www.rhein-zeitung.de/>
>
>
>Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
>Gesendet: Freitag, 29. Januar 2016 08:57
>An: Xymon at xymon.com<mailto:Xymon at xymon.com>
>Betreff: [Xymon] Xymon disruption every night!
>
>
>Hi,
>
>I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8
>3.2.0-4-amd64
>Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min,
>around 30 hosts become unreachable :
>
>Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
>Unable to resolve hostname foo.bar.local
>System unreachable for 3 poll periods (170 seconds)
>green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1<http://127.0.0.1>]
>
>
>- Got around 500 monitored hosts and looks like the same hosts are
>lost every single night.
>- Those monitored hosts are not necessary on the same network, not
>the same OS.
>- We cross monitored the same hosts and the other monitoring tool
>doesn't have report the DNS outage.
>- I ran a DNS lookup every seconds on the Hobbit server several days
>and it never reported a DNS outage.
>- I don't have any crontab installed on the server who could disturb
>Xymon.
>- Nothing strange in the Xymon logs nor the server logs, no memory
>leaks or CPU overloaded.
>- The rest of the day, Xymon server behavior is normal.
>- What I've done on the server 1 month ago ? I don't know, no system
>upgrade or so.
>- I had DNSMASQ acting like a cache, I disabled it : same issue
>- /etc/resolv.conf is quite light : search bar.local, next line :
>nameserver IP.OF.OUR.DNS.SERVER1, just like other servers
>
>The issue could be anywhere : inside or outside the server, Xymon or
>not... I have to confess, I'm running out of ideas to find the issue,
>is
>anyone here may have some leads, I will be thankful !
>
>Have a nice day!
>
>--
>Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma
>brièveté.
--
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160216/81be900b/attachment.html>
More information about the Xymon
mailing list