[Xymon] EXT :Re: Xymon flapping: network slowness reality or delusion?
    Mills, David (IS) 
    David.Mills at ngc.com
       
    Fri Mar 15 23:17:36 CET 2013
    
    
  
-----Original Message-----
From: cleaver at terabithia.org [mailto:cleaver at terabithia.org] 
Sent: Friday, March 15, 2013 1:31 PM
To: Mills, David (IS)
Cc: xymon at xymon.com
Subject: EXT :Re: [Xymon] Xymon flapping: network slowness reality or delusion?
> Hi, All ...
>
> The other day, our Xymon (4.3.3) started sending out notifications due 
> to flapping on various hosts, various network-based tests which lasted 
> for a rather sharply-defined period. It caused a fair bit of angst and 
> I was on the hot-seat to prove Xymon was functioning properly.
>
> Here are some of the summary facts:
>
<snip>
Assuming you're saving status results in history (the default), can you look at the status messages from the down periods? Were they DNS timeouts or timeout timeouts? I'd start with the ping checks, since that's pretty cut-and-dried...
- Has anything like this occurred before?
- Even if no threshold was crossed on the Xymon server itself, take a look at the 'trends' page for the polling host for that period and see if anything unusual happened around the same time?
HTH,
-jc
==
Thanks! After poking around on the Xymonnet history dumps, I found some very interesting stuff I don't know what to make of:
- For the top 20 worst times in a 24 hour period, the three categories of networking that had significantly elevated levels were "TCP tests completed", "DNS tests executed" and "NTP tests executed".
- Oddly, after graphing the respective times for these categories in a spreadsheet, it became obvious that the DNS and TCP tests were roughly inversions of each other: when one was super-high, the other would go low. 
- Even weirder, the PING tests were ... NORMAL!! While the rest of the Xymon network tests were jumping off a cliff, good old 'ping' was chugging along without (mostly) mishap. This last datum seems to blow a hole in the theory that this is truly a network problem (vs. a Xymon server/host problem).
Any other thoughts?
david
    
    
More information about the Xymon
mailing list