[Xymon] xymongen hanging
Jeremy Laidman
jeremy at laidman.org
Tue Oct 18 00:42:19 CEST 2022
Yep, the fact that the username is apache tells me that it wasn't initiated
by crontab or tasks.cfg, but instead by a user clicking on Reports >
Availability Report, and Reports > Snapshot Report, in the Xymon menu.
fd3 is a file with event history. The snapshot and availability reports
look through all of the history files to see any events that were present
at/during the report timeframe. So this is normal, unless it's stuck on the
same file for more than the briefest period. Did you run lsof on any other
processes to see what files were open on fd3? If it's the same file for all
of them, this might suggest a filesystem problem.
As these processes are owned by apache, it's worth taking a look at the
Apache logs around the time the processes were launched. You might be able
to get a more accurate start time from /proc/14749 than the output of "ps".
The missing dollar sign is peculiar. But I wonder if that's just what "ps"
does. Or bash. What does the output of "strings /proc/14749/cmdline" look
like?
The $XYMONGENSNAPOPTS comes from the script snapshot.sh. Mine definitely
has a dollar sign in there.
J
On Tue, 18 Oct 2022 at 08:45, David Logan <David.Logan at nt.gov.au> wrote:
> Thanks Jeremy,
>
>
>
> Yes I saw that but I’m somewhat confused. In the tasks.cfg xymongen is set
> to run every minute (I think this is the distribution copy) and it probably
> does as our graphs are up to date. On Sunday am it starts about 17
> processes to do snapshots and reports. The crontabs are empty and I cannot
> find where these are started from. The biggest problem is they take massive
> amounts of cpu while sitting at the fread of fd 3. I cannot work out what
> is holding it up. The whole thing should be over in a matter of an hour or
> so but it can take up to 72 hrs to process the whole show.
>
>
>
> I can also see what is possible an error in the process as I don’t think
> there is a $ in front of a variable and I’m wondering if this is the root
> cause.
>
>
>
> Thanks
>
> David
>
>
>
> *David Logan*
>
> *Senior Systems Administrator*
>
> *Data Centre Services*
>
> Department of *Corporate and Digital Development* *| *Northern Territory
> Government
> GPO Box 2391, Darwin, NT 0801,
> Australia
>
> *DCS Midrange Ticketing System*
>
> *p ... <+61> 8 8999 6968 *
>
> *m … <+61> 458 631 117 *New and Existing tickets:
> http://dcscentral.nt.gov.au/
>
> *e ... **david.logan at nt.gov.au
> <david.logan at nt.gov.au> *or
> dcs_service at nt.gov.au
>
> *w … www.nt.gov.au
> <http://www.nt.gov.au/>
> **Escalations: (08) 8999 7654*
>
>
>
> *Our vision:* *improve government through services and solutions that
> exceed expectations*
>
> Our values: *Honest **| **Professional* *| Respectful | **Accountable*
> *| **Innovative *
>
> The information in this e-mail is intended solely for the addressee named.
> It may contain legally privileged or confidential information that is
> subject to copyright. If you are not the intended recipient you must not
> use, disclose copy or distribute this communication. If you have received
> this message in error, please delete the e-mail and notify the sender. No
> representation is made that this e-mail is free of viruses. Virus scanning
> is recommended and is the responsibility of the recipient.
>
> Please consider the environment before printing this email.
>
>
>
> *From:* Jeremy Laidman <jeremy at laidman.org>
> *Sent:* Monday, 17 October 2022 3:42 PM
> *To:* David Logan <David.Logan at nt.gov.au>
> *Cc:* xymon at xymon.com
> *Subject:* Re: [Xymon] xymongen hanging
>
>
>
> Hi David
>
>
>
> The "snapshot.cgi" runs from the web interface, and creates a snapshot
> report. The script snapshot.sh runs snapshot.cgi, and this in turn runs
> xymongen with "--snapshot=..." as an argument.
>
>
>
> Similarly, the "report.cgi" runs from the web interface, and creates an
> availability report, using "--reportops=..." as an argument.
>
>
>
> Also, take a look at the xymonreports.sh script. At the top (of my copy)
> of this script there are instructions on creating a crontab entry to run
> the script so as to generate daily, weekly and monthly reports. These would
> generate xymongen processes with "--reportopts=..." as an argument.
>
>
>
> See "man snapshot" and "man report" for more info.
>
>
>
> Cheers
>
> Jeremy
>
>
>
> On Mon, 17 Oct 2022 at 15:57, David Logan <David.Logan at nt.gov.au> wrote:
>
> Hi Folks,
>
>
>
> Just wondering if anybody has any experience with xymongen hanging. I have
> a large number of xymongen processes being kicked off sometime over the
> weekend, unfortunately they are owned by apache and have a PPID of 1 so I
> can’t tell how they were started. I’m presuming either xymoncmd but I can’t
> see anything in the crontab for xymon or in tasks.cfg that would kick off
> the snapshots and reporting processes.
>
>
>
> These then sit for a very long time (> 24hrs) while trying to read a data
> file from a specific server.
>
>
>
> apache 14749 1 44 Oct16 ? 10:28:39
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/14748-1665896723
>
> apache 14867 1 43 Oct16 ? 10:26:32
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/14866-1665896747
>
> apache 15107 1 43 Oct16 ? 10:26:05
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/15106-1665896768
>
> apache 15118 1 43 Oct16 ? 10:25:58
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/15117-1665896774
>
> apache 15125 1 43 Oct16 ? 10:25:12
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/15124-1665896783
>
> apache 15238 1 43 Oct16 ? 10:23:26
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15237-1665896797
>
> apache 15269 1 43 Oct16 ? 10:25:31
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/15268-1665896804
>
> apache 15349 1 43 Oct16 ? 10:22:20
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/15348-1665896807
>
> apache 15382 1 43 Oct16 ? 10:23:40
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15381-1665896828
>
> apache 15398 1 43 Oct16 ? 10:25:13
> /xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
> /xymon/server/server/www/snap/15397-1665896834
>
> apache 15400 1 43 Oct16 ? 10:22:59
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15399-1665896837
>
> apache 15757 1 43 Oct16 ? 10:24:48
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15756-1665896864
>
> apache 15842 1 43 Oct16 ? 10:22:32
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15841-1665896873
>
> apache 15964 1 43 Oct16 ? 10:24:21
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15963-1665896897
>
> apache 15996 1 43 Oct16 ? 10:22:25
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/15995-1665896912
>
> apache 16133 1 43 Oct16 ? 10:22:07
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/16132-1665896933
>
> apache 16149 1 43 Oct16 ? 10:23:37
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/16148-1665896954
>
> apache 16215 1 43 Oct16 ? 10:23:45
> /xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
> /xymon/server/server/www/rep/16214-1665896972
>
>
>
> An strace for the first pid is as follows (they are all the same) and
> looking at file descriptor 3
>
>
>
> [root at dcslmonitor 15238]# strace -f -p 14749
>
> Process 14749 attached
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
> read(3, "", 4096) = 0
>
>
>
> fd3 is
>
>
>
> xymongen 14749 apache cwd DIR
> 253,0 6 134320195 /xymon/server/data/acks
>
> xymongen 14749 apache rtd DIR
> 8,2 269 64 /
>
> xymongen 14749 apache txt REG 253,0
> 1106256 135222190 /xymon/server/server/bin/xymongen
>
> xymongen 14749 apache mem REG 8,6
> 155784 4448319 /usr/lib64/libselinux.so.1
>
> xymongen 14749 apache mem REG 8,6
> 109976 4873245 /usr/lib64/libresolv-2.17.so
>
> xymongen 14749 apache mem REG 8,6
> 15688 4259351 /usr/lib64/libkeyutils.so.1.5
>
> xymongen 14749 apache mem REG 8,6
> 67104 4471490 /usr/lib64/libkrb5support.so.0.1
>
> xymongen 14749 apache mem REG 8,6
> 142144 4873243 /usr/lib64/libpthread-2.17.so
>
> xymongen 14749 apache mem REG 8,6
> 90632 4195838 /usr/lib64/libz.so.1.2.7
>
> xymongen 14749 apache mem REG 8,6
> 19248 4358022 /usr/lib64/libdl-2.17.so
>
> xymongen 14749 apache mem REG 8,6
> 210824 4471445 /usr/lib64/libk5crypto.so.3.1
>
> xymongen 14749 apache mem REG 8,6
> 15920 4939663 /usr/lib64/libcom_err.so.2.1
>
> xymongen 14749 apache mem REG 8,6
> 967840 4259800 /usr/lib64/libkrb5.so.3.3
>
> xymongen 14749 apache mem REG 8,6
> 320400 4256684 /usr/lib64/libgssapi_krb5.so.2.2
>
> xymongen 14749 apache mem REG 8,6
> 2156272 4262067 /usr/lib64/libc-2.17.so
>
> xymongen 14749 apache mem REG 8,6
> 402384 4259730 /usr/lib64/libpcre.so.1.2.0
>
> xymongen 14749 apache mem REG 8,6
> 2521008 4256674 /usr/lib64/libcrypto.so.1.0.2k
>
> xymongen 14749 apache mem REG 8,6
> 470360 4195836 /usr/lib64/libssl.so.1.0.2k
>
> xymongen 14749 apache mem REG 8,6
> 163312 4448246 /usr/lib64/ld-2.17.so
>
> xymongen 14749 apache 0r FIFO
> 0,8 0t0 404824379 pipe
>
> xymongen 14749 apache 1w FIFO
> 0,8 0t0 404824380 pipe
>
> xymongen 14749 apache 2w FIFO
> 0,8 0t0 404824381 pipe
>
> xymongen 14749 apache 3r REG
> 253,0 524 67195718 /xymon/server/data/hist/accessntg.sslcert
>
>
>
> Every process (in the process list above) shows they have the same file
> open as fd3, are they locking each other out or more to the point, should
> they be?
>
>
>
> Any ideas on where to look or what to do next?
>
>
>
> Thanks
>
>
>
> *David Logan*
>
> *Senior Systems Administrator*
>
> *Data Centre Services*
>
> Department of *Corporate and Digital Development* *| *Northern Territory
> Government
> GPO Box 2391, Darwin, NT 0801,
> Australia
>
> *DCS Midrange Ticketing System*
>
> *p ... <+61> 8 8999 6968 *
>
> *m … <+61> 458 631 117 *New and Existing tickets:
> http://dcscentral.nt.gov.au/
>
> *e ... **david.logan at nt.gov.au
> <david.logan at nt.gov.au> *or
> dcs_service at nt.gov.au
>
> *w … www.nt.gov.au
> <http://www.nt.gov.au/>
> **Escalations: (08) 8999 7654*
>
>
>
> *Our vision:* *improve government through services and solutions that
> exceed expectations*
>
> Our values: *Honest **| **Professional* *| Respectful | **Accountable*
> *| **Innovative *
>
> The information in this e-mail is intended solely for the addressee named.
> It may contain legally privileged or confidential information that is
> subject to copyright. If you are not the intended recipient you must not
> use, disclose copy or distribute this communication. If you have received
> this message in error, please delete the e-mail and notify the sender. No
> representation is made that this e-mail is free of viruses. Virus scanning
> is recommended and is the responsibility of the recipient.
>
> Please consider the environment before printing this email.
>
>
>
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20221018/306d16b5/attachment.htm>
More information about the Xymon
mailing list