[Xymon] xymond not accepting connections
Japheth Cleaver
cleaver at terabithia.org
Fri Nov 18 18:53:12 CET 2016
This is (probably) a sign that you have stuck SysV IPC semaphores,
probably from the previous crash.
The fix is to stop all xymon/hobbit processes, and then remove the
hobbit-owned IPC stuff manually. On Linux, you'd run ipcs -a to find any
segments (also, queues and arrays) owned by the xymon/hobbit user and
use ipcrm to remove them.
ipcs output on a running system will look something like this:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x010045d6 0 xymon 600 262144 4
0x020045d6 32769 xymon 600 262144 2
0x030045d6 65538 xymon 600 262144 2
0x040045d6 98307 xymon 600 262144 3
0x050045d6 131076 xymon 600 262144 2
0x060045d6 163845 xymon 600 32768 2
0x070045d6 196614 xymon 600 26214400 3
0x080045d6 229383 xymon 600 26214400 2
0x090045d6 262152 xymon 600 131072 1
I'm not sure if the commands are the same on Solaris (don't have a Sun
box handy at the moment), but once those are gone things should start
back up.
This is reduced to an error in 4.x instead of an abort, but the behavior
is still undefined since it's easy for xymond to get into a deadlock
with pre-existing semaphores set, while we wait for a message to be
picked up by a process that may not exist.
HTH,
-jc
On 11/18/2016 8:53 AM, Mills,David (HHSC Contractor) wrote:
>
> OK… This is new: more details from xymonlaunch.log file:
>
> …
>
> 2016-11-18 07:13:34 Loading saved state
>
> 2016-11-18 07:13:59 Setting up network listener on 0.0.0.0:1984
>
> 2016-11-18 07:13:59 Setting up signal handlers
>
> 2016-11-18 07:13:59 Setting up xymond channels
>
> 2016-11-18 07:13:59 FATAL: xymond sees clientcount 2,
> should be 0
>
> Check for hanging xymond_channel processes or stale semaphores
>
> 2016-11-18 07:13:59 Cannot setup data channel
>
> 2016-11-18 07:13:59 Task xymond terminated, status 1
>
> 2016-11-18 07:13:59 Task xymongen terminated by signal 15
>
> 2016-11-18 07:13:59 Task xymonnet terminated by signal 15
>
> 2016-11-18 07:13:59 Loading hostnames
>
> 2016-11-18 07:14:39 xgetenv: Cannot find value for variable
> HOME
>
> 2016-11-18 07:17:41 xgetenv: Cannot find value for variable
> HOME
>
> 2016-11-18 07:20:40 xgetenv: Cannot find value for variable
> HOME
>
> 2016-11-18 07:23:30 Task xymonnetagain terminated, status 208
>
> …
>
> The above lines pretty much cycle endlessly.
>
> *From:*Xymon [mailto:xymon-bounces at xymon.com] *On Behalf Of
> *Mills,David (HHSC Contractor)
> *Sent:* Thursday, November 17, 2016 5:17 PM
> *To:* 'xymon at xymon.com'
> *Subject:* [Xymon] xymond not accepting connections
>
> Hi, all!
>
> We have a rather murky situation. A colleague accidentally completely
> removed the Xymon (4.3.3 / Solaris) server home directory recently. It
> was restored from backups, but since then that server has not been
> completely functioning. (‘Don’t know if our symptoms are related to
> the home dir “zap” or what…)
>
> We periodically run the ghostlist.cgi app from cron and now these
> instances sometimes don’t exit. When I run truss on them, I see they
> are almost continuously calling brk(): allocating anonymous memory for
> that instance’s heap. It’s gotten so bad that we’ve had this server’s
> resources completely depleted and now have had to turn off the cron jobs
>
> The xymond daemon is no longer accepting connections, despite the fact
> that this server has been stable for years.
>
> The system was rebooted last night and seemed to be
> functioning throughout the night but stopped updating around 7:30 AM
>
> Confirmed xymond is no longer accepting connections via:
>
> 17:03:49 pwsu020:/var/log/xymon> telnet 127.0.0.1 1984
>
> Trying 127.0.0.1...
>
> telnet: Unable to connect to remote host: Connection refused
>
> 17:02:53 pwsu020:/var/log/xymon> ps -u hobbit -f
>
> UID PID PPID C STIME TTY TIME CMD
>
> hobbit 4132 4131 1 17:02:36 ? 0:26 xymond
> --pidfile=/var/log/xymon/xymond.pid --restart=/export/xymon/server/tmp/x
>
> hobbit 4288 4279 0 17:03:01 ? 0:01 /usr/bin/perl -w
> /usr/local/devmon/devmon
>
> hobbit 12895 12867 0 10:54:01 pts/5 0:00 -bash
>
> hobbit 4278 1908 0 17:03:00 ? 0:00 sh -c
> /usr/local/devmon/bin/restart.devmon> /dev/null 2>&1
>
> hobbit 12491 5466 0 03:10:04 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 12490 5466 0 03:10:04 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 12487 5466 0 03:10:04 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 15958 5466 0 11:20:04 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 15612 5466 0 11:17:01 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 17158 5466 0 03:45:03 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 12488 5466 0 03:10:04 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 12489 5466 0 03:10:04 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 4290 4289 0 17:03:01 ? 0:01 /usr/bin/perl -w
> /usr/local/devmon/devmon
>
> hobbit 4257 4143 0 17:02:46 ? 0:00
> /home/hobbit/xymon/client/bin/xymon 0.0.0.0 @
>
> hobbit 15776 5466 0 11:18:51 ? 0:00
> /usr/local/apache2/bin/httpd -k start
>
> hobbit 4148 4141 1 17:02:41 ? 0:21
> /export/xymon/server/bin/xymonnet --ping --checkresponse --timeout=10
> --dns-tim
>
> hobbit 4279 4278 0 17:03:01 ? 0:00 /bin/ksh
> /usr/local/devmon/bin/restart.devmon
>
> hobbit 4135 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=client --log=/var/log/xymon/clientdata.log xymond_clie
>
> hobbit 4140 4131 1 17:02:41 ? 0:21 xymonnet --report
> --ping --checkresponse --timeout=10 --dns-timeout=2 --dnslog=
>
> hobbit 4141 4131 0 17:02:41 ? 0:00 /bin/sh
> /export/xymon/server/ext/xymonnet-again.sh
>
> hobbit 4137 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=data --log=/var/log/xymon/rrd-data.log xymond_rrd --rr
>
> hobbit 4144 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=data --log=/var/log/xymon/data.log xymond_filestore --
>
> hobbit 4131 1 0 17:02:36 ? 0:00
> /export/xymon-4.3.3/server/bin/xymonlaunch
> --config=/export/xymon-4.3.3/server/
>
> hobbit 4136 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=status --log=/var/log/xymon/rrd-status.log xymond_rrd
>
> hobbit 4133 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=stachg --log=/var/log/xymon/history.log xymond_history
>
> hobbit 12912 12885 0 10:54:09 pts/7 0:00 -bash
>
> hobbit 4143 4131 0 17:02:41 ? 0:00 /bin/sh
> /export/xymon-4.3.3/client/bin/xymonclient.sh
>
> hobbit 4289 4288 0 17:03:01 ? 0:00 /usr/bin/perl -w
> /usr/local/devmon/devmon
>
> hobbit 4139 4131 1 17:02:41 ? 0:21 xymongen
> --recentgifs --subpagecolumns=2 --ignorecolumns=files --tooltips=never
>
> hobbit 4134 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=page --log=/var/log/xymon/alert.log xymond_alert --che
>
> hobbit 4138 4131 0 17:02:41 ? 0:00 xymond_channel
> --channel=clichg --log=/var/log/xymon/hostdata.log xymond_hostda
>
> The only other clue I’ve been able to find is this note in the
> xymonlaunch.log file:
>
> 15:29:24 pwsu020:/var/log/xymon> tail -50f xymonlaunch.log
>
> ...
>
> 2016-11-17 13:54:36 xymonlaunch starting
>
> 2016-11-17 13:54:36 Loading tasklist configuration from
> /export/xymon-4.3.3/server/etc/tasks.cfg
>
> 2016-11-17 13:54:36 Loading hostnames
>
> 2016-11-17 13:54:41 xgetenv: Cannot find value for variable HOME
>
> 2016-11-17 13:57:44 xgetenv: Cannot find value for variable HOME
>
> 2016-11-17 14:00:46 xgetenv: Cannot find value for variable HOME
>
> Yet, when I tried this, as well as grep’ing through
> xymonlaunch “truss” output for HOME, I see valid home directory values:
>
> 13:58:35 pwsu020:~> echo 'echo HOME=$HOME XYMSRV=$XYMSRV
> XYMSERVERS=$XYMSERVERS XYMONDPORT=$XYMONDPORT' |
> /home/hobbit/xymon/client/bin/xymoncmd
>
> 2016-11-17 13:58:38 Using default environment file
> /export/xymon-4.3.3/client/etc/xymonclient.cfg
>
> HOME=/home/hobbit XYMSRV=0.0.0.0 XYMSERVERS=10.235.57.11 10.235.157.56
> XYMONDPORT=1984
>
> Help!
>
> david
>
> ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
>
> David Mills
> Systems Administrator
> */Northrop Grumman/*
> (512) 595-1238 (mobile)
>
>
>
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20161118/ed7f7520/attachment.html>
More information about the Xymon
mailing list