[Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10
Don Kuhlman
Don.Kuhlman at schawk.com
Fri Nov 9 17:56:29 CET 2012
Hi folks. Here's an update on this issue. Since putting the MAXTIME value into the tasks.cfg file on Tuesday, the purple storms have stopped.
I don't know where to check for logs to see if Xymon has been killing the xymonnet process or not, but at least our Purple storms have stopped.
Regards,
Don K
From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Tue, 6 Nov 2012 17:17:54 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10
Did your purples clear up? It can a couple of minutes sometimes, depending on how often you regen your web pages.
A quick hack to keep them from coming back would be to add a MAXTIME to the xymonnet stanza in tasks.cfg.
I'm not really sure what else to tell you. If the process hangs again you might try to "kill -6" it and send the resulting core dump to Henrik.
Thanks,
Larry Barber
On Tue, Nov 6, 2012 at 9:15 AM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
HI Larry/all. Sorry I didn't post the last reply to the list.
Update – Larry suggested looking for a hung xymonnet process – found one. Killed that.
Changed tasks.cfg to add —debug and am now getting log updates in xymonnet.log
Looked for another xymonnet process and don't see any.
The web pages are still showing purple on the CONN, HTTP, and the XYMONNET status is also purple.
Thanks for your help Larry.
Any further suggestions as to what to look for in the log or elsewhere that may indicate the problem?
Don K
From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Tue, 6 Nov 2012 08:47:55 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10
Did you check to see if a xymonnet process is/was still running? If a process gets hung for some reason xymonlaunch won't start a new process. I had this happen to me once, but only once. There is also a --debug flag for xymonnet, but it produces a _lot_ of output, but it might give you some idea what is going on.
Thanks,
Larry Barber
On Tue, Nov 6, 2012 at 8:02 AM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
Thanks Larry. Looks like everything went purple again at 6:45 this morning. The logs still show 0 bytes.
Any other suggestions for trying to figure this out?
Regards,
Don
From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Mon, 5 Nov 2012 17:19:53 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10
Xymonnet tends to be pretty quiet unless something goes wrong. You won't be able to tell for sure until you get one of your purple storms.
Alerts are handled by a different module. Look in tasks.cfg to find it.
Thanks,
Larry Barber
On Mon, Nov 5, 2012 at 3:53 PM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
Hi Larry/all. I've noticed that the xymonnet.log and xymonnet-again.log files are staying at 0 bytes. Does that seem to be indicating a problem?
(and Xymon hasn't gone purple all day, but I'm still not sending any email alerts to anyone).
-rw-rw-rw- 1 xymon xymon 0 Nov 5 15:05 /var/log/xymon/xymonnet-again.log
-rw-rw-rw- 1 xymon xymon 0 Nov 5 15:07 /var/log/xymon/xymonnet.log
Thanks
Don K
From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Mon, 5 Nov 2012 11:19:32 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10
All the server side Xymon logs are in /var/log/xymon by default. Since you say that you are getting purple storms for conn and http tests, this suggests that the problem is likely with your xymonnet process. Check the xymonnet log, and when you see the purples check to see if there is a xymonnet instance running. If this instance has been running for more than a few minutes, kill it. If the xymonnet process is hanging, you might want to set the MAXTIME parameter on the xymonnet process in tasks.cfg. Doesn't really fix the problem, but it will at least stop things from going purple.
Thanks,
Larry Barber
On Mon, Nov 5, 2012 at 10:01 AM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
Update to this. While googling further, I saw a thread titled "[hobbit] stale alerts". This mentioned that there could be an external script that I created which may cause issues for xymon when it runs. I do have a diskstat.sh script that may be causing problems. For now, I'm setting it to DISABLED in the tasks.cfg file.
Is there a way to see log information in xymon to try and verify something like this?
Thanks
Don K
From: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Date: Mon, 5 Nov 2012 08:34:29 -0600
To: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10
Hi folks. We've been running xymon for about 10 months now. It's been fine all this time.
However last week around Wednesday we started getting purple storms on the CONN and HTTP tests for all our hosts.
I stop Xymon and restart it, or reboot the server (Linux 5.x) and then it comes back ok.
This also happened Thursday, and then again Saturday around 2PM cst.
Anyone have a link or source for which logs to look in on the server or xymon to see what may be causing the CONN and HTTP tests to randomly start failing like this or where to start troubleshooting?
Can I use xymonlaunch —debug like this to see what is happening?
/usr/lib64/xymon/server/bin/xymonlaunch --debug --config=/usr/lib64/xymon/server/etc/tasks.cfg --env=/usr/lib64/etc/xymonserver.cfg
While searching the xymon forum and message boards, I saw some things that say it may be disk space or inodes, but it seems like we are ok there -
df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda2 3899392 204731 3694661 6% /
tmpfs 490139 6 490133 1% /dev/shm
/dev/sda1 32768 51 32717 1% /boot
df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 61312028 5748784 52448700 10% /
tmpfs 1960556 188 1960368 1% /dev/shm
/dev/sda1 516040 87716 402112 18% /boot
DNS also seems fine.
Thanks
Don K
_______________________________________________
Xymon mailing list
Xymon at xymon.com<mailto:Xymon at xymon.com>
http://lists.xymon.com/mailman/listinfo/xymon
_______________________________________________
Xymon mailing list
Xymon at xymon.com<mailto:Xymon at xymon.com>
http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20121109/8e4fea3d/attachment.html>
More information about the Xymon
mailing list