[Xymon] xymongen hanging

David Logan David.Logan at nt.gov.au
Mon Oct 17 06:57:25 CEST 2022


Hi Folks,

Just wondering if anybody has any experience with xymongen hanging. I have a large number of xymongen processes being kicked off sometime over the weekend, unfortunately they are owned by apache and have a PPID of 1 so I can't tell how they were started. I'm presuming either xymoncmd but I can't see anything in the crontab for xymon or in tasks.cfg that would kick off the snapshots and reporting processes.

These then sit for a very long time (> 24hrs) while trying to read a data file from a specific server.

apache   14749     1 44 Oct16 ?        10:28:39 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14748-1665896723
apache   14867     1 43 Oct16 ?        10:26:32 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/14866-1665896747
apache   15107     1 43 Oct16 ?        10:26:05 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15106-1665896768
apache   15118     1 43 Oct16 ?        10:25:58 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15117-1665896774
apache   15125     1 43 Oct16 ?        10:25:12 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15124-1665896783
apache   15238     1 43 Oct16 ?        10:23:26 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15237-1665896797
apache   15269     1 43 Oct16 ?        10:25:31 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15268-1665896804
apache   15349     1 43 Oct16 ?        10:22:20 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15348-1665896807
apache   15382     1 43 Oct16 ?        10:23:40 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15381-1665896828
apache   15398     1 43 Oct16 ?        10:25:13 /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS /xymon/server/server/www/snap/15397-1665896834
apache   15400     1 43 Oct16 ?        10:22:59 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15399-1665896837
apache   15757     1 43 Oct16 ?        10:24:48 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15756-1665896864
apache   15842     1 43 Oct16 ?        10:22:32 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15841-1665896873
apache   15964     1 43 Oct16 ?        10:24:21 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15963-1665896897
apache   15996     1 43 Oct16 ?        10:22:25 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/15995-1665896912
apache   16133     1 43 Oct16 ?        10:22:07 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16132-1665896933
apache   16149     1 43 Oct16 ?        10:23:37 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16148-1665896954
apache   16215     1 43 Oct16 ?        10:23:45 /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1: /xymon/server/server/www/rep/16214-1665896972

An strace for the first pid is as follows (they are all the same) and looking at file descriptor 3

[root at dcslmonitor 15238]# strace -f -p 14749
Process 14749 attached
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0

fd3 is

xymongen  14749               apache  cwd       DIR              253,0         6  134320195 /xymon/server/data/acks
xymongen  14749               apache  rtd       DIR                8,2       269         64 /
xymongen  14749               apache  txt       REG              253,0   1106256  135222190 /xymon/server/server/bin/xymongen
xymongen  14749               apache  mem       REG                8,6    155784    4448319 /usr/lib64/libselinux.so.1
xymongen  14749               apache  mem       REG                8,6    109976    4873245 /usr/lib64/libresolv-2.17.so
xymongen  14749               apache  mem       REG                8,6     15688    4259351 /usr/lib64/libkeyutils.so.1.5
xymongen  14749               apache  mem       REG                8,6     67104    4471490 /usr/lib64/libkrb5support.so.0.1
xymongen  14749               apache  mem       REG                8,6    142144    4873243 /usr/lib64/libpthread-2.17.so
xymongen  14749               apache  mem       REG                8,6     90632    4195838 /usr/lib64/libz.so.1.2.7
xymongen  14749               apache  mem       REG                8,6     19248    4358022 /usr/lib64/libdl-2.17.so
xymongen  14749               apache  mem       REG                8,6    210824    4471445 /usr/lib64/libk5crypto.so.3.1
xymongen  14749               apache  mem       REG                8,6     15920    4939663 /usr/lib64/libcom_err.so.2.1
xymongen  14749               apache  mem       REG                8,6    967840    4259800 /usr/lib64/libkrb5.so.3.3
xymongen  14749               apache  mem       REG                8,6    320400    4256684 /usr/lib64/libgssapi_krb5.so.2.2
xymongen  14749               apache  mem       REG                8,6   2156272    4262067 /usr/lib64/libc-2.17.so
xymongen  14749               apache  mem       REG                8,6    402384    4259730 /usr/lib64/libpcre.so.1.2.0
xymongen  14749               apache  mem       REG                8,6   2521008    4256674 /usr/lib64/libcrypto.so.1.0.2k
xymongen  14749               apache  mem       REG                8,6    470360    4195836 /usr/lib64/libssl.so.1.0.2k
xymongen  14749               apache  mem       REG                8,6    163312    4448246 /usr/lib64/ld-2.17.so
xymongen  14749               apache    0r     FIFO                0,8       0t0  404824379 pipe
xymongen  14749               apache    1w     FIFO                0,8       0t0  404824380 pipe
xymongen  14749               apache    2w     FIFO                0,8       0t0  404824381 pipe
xymongen  14749               apache    3r      REG              253,0       524   67195718 /xymon/server/data/hist/accessntg.sslcert

Every process (in the process list above) shows they have the same file open as fd3, are they locking each other out or more to the point, should they be?

Any ideas on where to look or what to do next?

Thanks

David Logan
Senior Systems Administrator
Data Centre Services
Department of Corporate and Digital Development | Northern Territory Government
GPO Box 2391, Darwin, NT 0801, Australia
DCS Midrange Ticketing System
p   ... <+61> 8 8999 6968
m ...  <+61> 458 631 117            New and Existing tickets: http://dcscentral.nt.gov.au/
e  ... david.logan at nt.gov.au<mailto:david.logan at nt.gov.au>                                                or dcs_service at nt.gov.au<mailto:dcs_service at nt.gov.au>
w ... www.nt.gov.au<http://www.nt.gov.au/>                                                             Escalations: (08) 8999 7654

Our vision: improve government through services and solutions that exceed expectations
Our values: Honest  | Professional  | Respectful  | Accountable   | Innovative
The information in this e-mail is intended solely for the addressee named. It may contain legally privileged or confidential information that is subject to copyright. If you are not the intended recipient you must not use, disclose copy or distribute this communication. If you have received this message in error, please delete the e-mail and notify the sender. No representation is made that this e-mail is free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Please consider the environment before printing this email.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20221017/290d3617/attachment.htm>


More information about the Xymon mailing list