[Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10

Don Kuhlman Don.Kuhlman at schawk.com
Thu Nov 8 15:04:03 CET 2012


Hi Larry/all.  Update on the hanging xymonnet process. I added the maxtime value to the tasks.cfg file for xymonnet and it has stopped the purple storms since yesterday.  Does xymon write to a log file if it has been killing the process after maxtime?

As a side note, I am seeing ping times in the xymonnet process of 4.xxxx  but that doesn't seem to be too large to me.  Here's a snip from the xymonnet status page:

PING test completed (67 hosts)           1352382608.209002          4.540616

Thanks again for the help Larry!

Don


From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Tue, 6 Nov 2012 17:17:54 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10

Did your purples clear up? It can a couple of minutes sometimes, depending on how often you regen your web pages.

A quick hack to keep them from coming back would be to add a MAXTIME to the xymonnet stanza in tasks.cfg.

I'm not really sure what else to tell you. If the process hangs again you might try to "kill -6" it and send the resulting core dump to Henrik.

Thanks,
Larry Barber

On Tue, Nov 6, 2012 at 9:15 AM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
HI Larry/all. Sorry I didn't post the last reply to the list.

Update – Larry suggested looking for a hung xymonnet process – found one. Killed that.
Changed tasks.cfg to add —debug and am now getting log updates in xymonnet.log
Looked for another xymonnet process and don't see any.
The web pages are still showing purple on the CONN, HTTP, and the XYMONNET status is also purple.

Thanks for your help Larry.

Any further suggestions as to what to look for in the log or elsewhere that may indicate the problem?

Don K
From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Tue, 6 Nov 2012 08:47:55 -0600

To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10

Did you check to see if a xymonnet process is/was still running? If a process gets hung for some reason xymonlaunch won't start a new process. I had this happen to me once, but only once. There is also a --debug flag for xymonnet, but it produces a _lot_ of output, but it might give you some idea what is going on.

Thanks,
Larry Barber

On Tue, Nov 6, 2012 at 8:02 AM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
Thanks Larry. Looks like everything went purple again at 6:45 this morning.  The logs still show 0 bytes.
Any other suggestions for trying to figure this out?

Regards,

Don

From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Mon, 5 Nov 2012 17:19:53 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>

Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10

Xymonnet tends to be pretty quiet unless something goes wrong. You won't be able to tell for sure until you get one of your purple storms.

Alerts are handled by a different module. Look in tasks.cfg to find it.

Thanks,
Larry Barber

On Mon, Nov 5, 2012 at 3:53 PM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
Hi Larry/all.  I've noticed that the xymonnet.log and xymonnet-again.log files are staying at 0 bytes.  Does that seem to be indicating a problem?
(and Xymon hasn't gone purple all day, but I'm still not sending any email alerts to anyone).

-rw-rw-rw- 1 xymon xymon        0 Nov  5 15:05 /var/log/xymon/xymonnet-again.log
-rw-rw-rw- 1 xymon xymon        0 Nov  5 15:07 /var/log/xymon/xymonnet.log

Thanks

Don K



From: Larry Barber <lebarber at gmail.com<mailto:lebarber at gmail.com>>
Date: Mon, 5 Nov 2012 11:19:32 -0600
To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Re: [Xymon] FW: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10

All the server side Xymon logs are in /var/log/xymon by default. Since you say that you are getting purple storms for conn and http tests, this suggests that the problem is likely with your xymonnet process. Check the xymonnet log, and when you see the purples check to see if there is a xymonnet instance running. If this instance has been running for more than a few minutes, kill it. If the xymonnet process is hanging, you might want to set the MAXTIME parameter on the xymonnet process in tasks.cfg. Doesn't really fix the problem, but it will at least stop things from going purple.

Thanks,
Larry Barber

On Mon, Nov 5, 2012 at 10:01 AM, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote:
Update to this. While googling further, I saw a thread titled "[hobbit] stale alerts".  This mentioned that there could be an external script that I created which may cause issues for xymon when it runs.  I do have a diskstat.sh script that may be causing problems. For now, I'm setting it to DISABLED in the tasks.cfg file.

Is there a way to see log information in xymon to try and verify something like this?

Thanks

Don K

From: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>>
Date: Mon, 5 Nov 2012 08:34:29 -0600
To: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>>
Subject: Troubleshooting Purple CONN and HTTP Tests in Xymon 4.3.10

Hi folks.  We've been running xymon for about 10 months now. It's been fine all this time.

However last week around Wednesday we started getting purple storms on the CONN and HTTP tests for all our hosts.
I stop Xymon and restart it, or reboot the server (Linux 5.x) and then it comes back ok.
This also happened Thursday, and then again Saturday around 2PM cst.

Anyone have a link or source for which logs to look in on the server or xymon to see what may be causing the CONN and HTTP tests to randomly start failing like this or where to start troubleshooting?

Can I use xymonlaunch —debug like this to see what is happening?
        /usr/lib64/xymon/server/bin/xymonlaunch --debug --config=/usr/lib64/xymon/server/etc/tasks.cfg --env=/usr/lib64/etc/xymonserver.cfg



While searching the xymon forum and message boards, I saw some things that say it may be disk space or inodes, but it seems like we are ok there -
df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda2            3899392  204731 3694661    6% /
tmpfs                 490139       6  490133    1% /dev/shm
/dev/sda1              32768      51   32717    1% /boot

df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             61312028   5748784  52448700  10% /
tmpfs                  1960556       188   1960368   1% /dev/shm
/dev/sda1               516040     87716    402112  18% /boot

DNS also seems fine.

Thanks

Don K

_______________________________________________
Xymon mailing list
Xymon at xymon.com<mailto:Xymon at xymon.com>
http://lists.xymon.com/mailman/listinfo/xymon




_______________________________________________
Xymon mailing list
Xymon at xymon.com<mailto:Xymon at xymon.com>
http://lists.xymon.com/mailman/listinfo/xymon



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20121108/60634dd0/attachment.html>


More information about the Xymon mailing list