<div dir="ltr">Ok, I have a really disgusting workaround for xymonnet timeouts on rpcinfo.  Set RPCINFO in xymonserver.cfg to point to a wrapper like the following, which will kill off the rpcinfo process after 9 seconds, if it hasn't already finished.  This seems to give the expected result whether the host being tested is up or down, without causing xymonnet timeout errors.  It does seem necessary to have the return code be 0 or 1, and not let it default to the return code of "wait" (which could be e.g. 143 if the process was killed, and would look like a different kind of error to xymonnet).<div>======== cut here ========</div><div><div>#! /bin/sh</div><div>/usr/bin/rpcinfo ${1+"${@}"} &<br></div><div>pid="${!}"</div><div>(sleep 9; kill -0 "${pid}" && kill "${pid}") 2>/dev/null &</div><div>wait "${pid}"</div><div>if [ $? -eq 0 ]</div><div>then</div><div>        exit 0</div><div>else</div><div>        exit 1</div><div>fi</div></div><div>======== cut here ========<br></div><div>For a "dialup" host (actually a VM that wasn't running), the result was reasonable: clear, and output of</div><div><table align="CENTER" border="0" summary="Detail Status" style="color:rgb(216,216,191);font-family:-webkit-standard"><tbody><tr><td align="LEFT"><h3><font color="#000000">Sat Feb 18 15:04:09 2017 rpc ok, Service unavailable</font></h3><pre><font color="#000000">Dialup host or service

Could not connect to the portmapper service

Command: /export/home/xymon/server/bin/rpcinfo -p 192.168.0.56 2>&1

/export/home/xymon/server/bin/rpcinfo[6]: wait: 13244: Terminated

</font>

</pre></td></tr></tbody></table><br></div><div>Still, as I said, this is rather disgusting, and I'd hope that not running external tests like rpcinfo or ntp when the conn test failed would be an option in the future.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 15, 2017 at 9:50 PM, Richard Hamilton <span dir="ltr"><<a href="mailto:rlhamil2@gmail.com" target="_blank">rlhamil2@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">In this case, "dialup" isn't literal, they're VMs under type II (hosted) hypervisors - VirtualBox or Parallels, in this case.  Since the hosts don't have gigantic amounts of RAM, the VMs are only brought up when needed (testing, development, updates, or nostalgia for some other OS); but when up, should be healthy, with all their usual services running.<div><br></div><div>Another dialup is my laptop, which is usually where I am, not necessarily back home with the xymon server. :-)  Since it has neither builtin cellular nor do I have an always-on portable cellular hotspot (although the phone can do that duty occasionally in the absence of a proper one), there's no way for it to be connected all the time, either.</div><div><br></div><div>Likewise, some non-infrastructure devices are dialup, because they're not on all the time - like a WiFi picture frame, various iDevices, or a game console.  If the printer didn't have energy saver mode, it would be a dialup too, because it wouldn't be left on all the time.</div><div><br></div><div>Literal dialup with a modem may be rare enough nowadays, but there are plenty of modern intermittently connected cases for which the functionality is still useful, IMO.</div><div><br></div><div>One way or another, exposing a way to have network tests contingent on basic connectivity, even when basic connectivity is optional (dialup), would IMO help, a lot - especially for external tests, of which rpc is the worst - ntp timeout is very quick by comparison; and RPC libraries come in different enough flavors that rolling a portable version of rpcinfo with a timeout option seems a bit tedious (I've looked at e.g. Solaris and Mac code for rpcinfo, and they're very different internally; the Mac's seems derived from a really old BSD flavor, more or less).</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 15, 2017 at 11:35 AM, Japheth Cleaver <span dir="ltr"><<a href="mailto:cleaver@terabithia.org" target="_blank">cleaver@terabithia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div><div class="m_7121225949629733408h5">

    <div class="m_7121225949629733408m_-7988653466373246467moz-cite-prefix">On 2/15/2017 7:50 AM, Richard Hamilton

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">I noticed I was getting these when a host (marked

        dialup) was down; turns out it's because there was an RPC test,

        and rpcinfo has no option to choose a reasonable timeout; trying

        to run it against a host that's down or unreachable takes nearly

        ten minutes to time out!

        <div><br>

        </div>

        <div>What I don't understand, is why, given the conn test was

          enabled and not green or yellow, it was trying to do other

          network tests on that host.</div>

        <div><br>

        </div>

        <div>Here's the host line:</div>

        <div>192.168.0.56<span class="m_7121225949629733408m_-7988653466373246467gmail-Apple-tab-span" style="white-space:pre-wrap">    </span>lapple-sierra<span class="m_7121225949629733408m_-7988653466373246467gmail-Apple-tab-span" style="white-space:pre-wrap">         </span>#

          dialup CLIENT:lapple-sierra.pri noflap=location ssh ntp

          rpc=mountd,nlockmgr,nfs,rpcbin<wbr>d,rquotad,status NOCOLUMNS:files

          multihomed NOPROPPURPLE:+location NOPROPYELLOW:+cpu,+location<br>

        </div>

        <div><br>

        </div>

        <div>(location is an client extension script, not relevant to

          the problem at hand)</div>

      </div>

      <br>

      <fieldset class="m_7121225949629733408m_-7988653466373246467mimeAttachmentHeader"></fieldset>

    </blockquote>

    <br></div></div>

    Interestingly, this appears to be intentional -- dialup tests are

    not considered "down" internally (clear is N/A more than a down

    state) and so they aren't bypassed later in the cycle when we get to

    running rpcinfo.<br>

    <br>

    I'm not entirely certain on the history here. This smells like it

    should be a bug for precisely the reason you're seeing. Mass

    timeouts testing against things that are down. OTOH, there may be

    cases where things are intermittently unpingable and yet people are

    expecting other testing to continue on. 'dialup' is a bit lesser

    used nowadays, which may be why this is less frequently hit.<br>

    <br>

    There's logic in xymonnet that allows for internal flagging of

    something as actually up or down for purposes of testing (to handle

    things like badconn); this should probably become an option for

    control in the future.<br>

    <br>

    Regards,<br>

    -jc<br>

  </div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>