[Xymon] Xymon "port" check intermittent failures for ssh TCP port 22 state=LISTEN

Jeremy Laidman jlaidman at rebel-it.com.au
Fri Jul 7 08:51:50 CEST 2017


Not much chance, really. This was my first guess at the cause. The [ports]
section appears complete (doesn't have its own limit as far as I know), the
[clock] section is present at the end, and the UTC: datestamp line is
present as the last line. Hence no artefacts I would expect to see when
truncation takes place.

Also, the client messages are less than 300kB, whereas the default limit is
512kB and I've bumped that up to 2MB.


On 7 July 2017 at 15:05, Ryan Novosielski <novosirj at rutgers.edu> wrote:

> Any chance this is truncation happening? That test can have a lot of
> output.
>
> --
> ____
> || \\UTGERS,       |---------------------------*
> O*---------------------------
> ||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ     | Office of Advanced Research Computing - MSB C630,
> Newark
>     `'
>
> On Jul 7, 2017, at 00:47, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
>
> Hi
>
> I'm getting what appear to be false-positives for the port test that is
> monitoring the LISTEN socket for port 22, as opened by the sshd daemon. A
> few times a month, Xymon will show that the server is not listening on port
> 22, and 5 minutes later, the listening port is back again. The sshd process
> has never crashed or been reconfigured (eg with SIGHUP), and no other
> listening ports are showing the same behaviour.  The client messages for
> the server during these events are complete and uncorrupted.
>
> The simplest fix is to use delayred to suppress alerts for 5 minutes.
> However, I would like to work out what's causing this behaviour. I don't
> believe this a problem with Xymon at all, and instead the netstat output in
> the client message is exactly what the OS provided the Xymon client. My
> guess is that it's due to a the way sshd works - perhaps it periodically
> rebinds to the socket - but nothing in the sshd logs seems to correlate
> with these events. If anyone can suggest what might be causing this, or how
> to investigate further, I'd be grateful.
>
> This problem happens for about a quarter of the servers in a pool, and no
> others. All servers are identical in OS, software and general
> configuration, but the servers affected by this tend to be the ones taking
> the most traffic and under the most load (although there's plenty of spare
> CPU cycles even on the most heavily-used server). I have two Xymon servers,
> each monitoring independently of the other, and this problem is reported by
> both Xymon servers, although at completely different dates and times.
>
> Cheers
> Jeremy
>
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20170707/ace212f8/attachment.html>


More information about the Xymon mailing list