[Xymon] duration of MSG red status

Nicole Beck nskyrca at syr.edu
Mon Nov 3 21:16:48 CET 2014


Hi Jeremy,

I got 7, one every 5 minutes.

ALERTREPEAT is set to 30 in hobbitserver.cfg.

Our hobbit-alerts.cfg file has “DURATION>1m REPEAT=5m” for the msgs test for that machine.

As far as I could tell, the messages status is yellow and it is staying yellow, not flapping.  When I click on history in the GUI, it shows that it was yellow for 35 minutes.

It looks like it’s the same message that we keep getting an alert for.  We had an incident on Friday, where we got 7 email alerts.  Below are examples of the portion of the email that showed the yellow alert.  The timestamp in the log is 21:00:16 for all of the alerts, so it’s the same message.

Email alert 1:

yellow System logs at Fri Oct 31 21:01:10 EDT 2014 <pre> </pre>



&yellow Warnings in <a href="/xymon-cgi/bb-hostsvc.sh?CLIENT=bbgroupa-web4.syr.edu&SECTION=msgs:/usr/local/blackboard/logs/tomcat/activemq.txt">/usr/local/blackboard/logs/tomcat/activemq.txt</a>

<pre>

&yellow WARN 2014-10-31 21:00:16,480 ActiveMQ NIO Worker 30057 org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://128.230.126.194:49464 failed: java.io.EOFException </pre>

Email alert 2

yellow System logs at Fri Oct 31 21:06:10 EDT 2014 <pre> </pre>



&yellow Warnings in <a href="/xymon-cgi/bb-hostsvc.sh?CLIENT=bbgroupa-web4.syr.edu&SECTION=msgs:/usr/local/blackboard/logs/tomcat/activemq.txt">/usr/local/blackboard/logs/tomcat/activemq.txt</a>

<pre>

&yellow WARN 2014-10-31 21:00:16,480 ActiveMQ NIO Worker 30057 org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://128.230.126.194:49464 failed: java.io.EOFException </pre>


Email alert 7

yellow System logs at Fri Oct 31 21:31:11 EDT 2014 <pre> </pre>



&yellow Warnings in <a href="/xymon-cgi/bb-hostsvc.sh?CLIENT=bbgroupa-web4.syr.edu&SECTION=msgs:/usr/local/blackboard/logs/tomcat/activemq.txt">/usr/local/blackboard/logs/tomcat/activemq.txt</a>

<pre>

&yellow WARN 2014-10-31 21:00:16,480 ActiveMQ NIO Worker 30057 org.apache.activemq.broker.TransportConnection.Transport - Transport Connection to: tcp://128.230.126.194:49464 failed: java.io.EOFException </pre>


The Hobbit acknowledge code that appears in the subject of the emails is all the same code.  Maybe we are getting multiple email messages because we did not acknowledge the alert. But, if the string does not appear again in the file in the next cycle, shouldn’t it turn back to green?

When it happens again, I will try to look at the “client data available” link .

I hope this helps.

Nicole


From: Jeremy Laidman [mailto:jlaidman at rebel-it.com.au]
Sent: Tuesday, October 28, 2014 9:37 PM
To: Nicole Beck
Cc: Bill Arlofski; xymon at xymon.com
Subject: Re: [Xymon] duration of MSG red status

Nicole

On 29 October 2014 05:16, Nicole Beck <nskyrca at syr.edu<mailto:nskyrca at syr.edu>> wrote:
What I’m seeing is that I get an alert for my trigger string (which has a timestamp on it), and then I keep getting alerts for the same trigger string (with the same timestamp) for the next 30 minutes.

How often do you get the repeated alerts?  Or how many in that 30 minutes?

I’m not sure if anything else was append to the log file in that 30 minutes. I stop getting the alerts after 30 minutes and don’t have to wait until the log is rotated for the alert to clear.

Do you have ALERTREPEAT defined in xymonserver.cfg?  The default is 30 seconds, but you may have it less than that.

Similarly, do you have "REPEAT" defined in alerts.cfg for the rule matching these alerts?  (The "REPEAT" value in alerts.cfg defaults to the setting of ALERTREPEAT.)

Is your message status (red?) staying non-green for the 30 minutes, or non-green for only a short time, or flapping like red/green/red/green?

The way messages get to Xymon are via the client data.  So during an "event" you can click on the "Client data available" link at the bottom of your "msgs" page for the host, and it should show you all of the client data, and you can search for the logfilename to see what log lines the client sent to the server.  Or you can click on the logfile name on the "msgs" page for a modified client data report showing just the log lines for that logfile.

What I'm trying to understand is whether you are getting the same messages sent multiple times from the client causing multiple events, or whether the one event is generating multiple alerts.

From what I can tell, a red "msgs" status will stay red for only one 5-minute client cycle.  The next time the client sends its client data report, if the logfile in question has no new matching lines, it will actively generate a green status.

J

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141103/6d70a900/attachment.html>


More information about the Xymon mailing list