[Xymon] xymonnet timeouts?

Japheth Cleaver cleaver at terabithia.org
Wed Feb 15 17:35:51 CET 2017

On 2/15/2017 7:50 AM, Richard Hamilton wrote:
> I noticed I was getting these when a host (marked dialup) was down; 
> turns out it's because there was an RPC test, and rpcinfo has no 
> option to choose a reasonable timeout; trying to run it against a host 
> that's down or unreachable takes nearly ten minutes to time out!
> What I don't understand, is why, given the conn test was enabled and 
> not green or yellow, it was trying to do other network tests on that host.
> Here's the host line:
> dialup CLIENT:lapple-sierra.pri 
> noflap=location ssh ntp rpc=mountd,nlockmgr,nfs,rpcbind,rquotad,status 
> NOCOLUMNS:files multihomed NOPROPPURPLE:+location 
> NOPROPYELLOW:+cpu,+location
> (location is an client extension script, not relevant to the problem 
> at hand)

Interestingly, this appears to be intentional -- dialup tests are not 
considered "down" internally (clear is N/A more than a down state) and 
so they aren't bypassed later in the cycle when we get to running rpcinfo.

I'm not entirely certain on the history here. This smells like it should 
be a bug for precisely the reason you're seeing. Mass timeouts testing 
against things that are down. OTOH, there may be cases where things are 
intermittently unpingable and yet people are expecting other testing to 
continue on. 'dialup' is a bit lesser used nowadays, which may be why 
this is less frequently hit.

There's logic in xymonnet that allows for internal flagging of something 
as actually up or down for purposes of testing (to handle things like 
badconn); this should probably become an option for control in the future.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20170215/798bd79e/attachment.html>

More information about the Xymon mailing list