[Xymon] Mismatched colors / confusing alert when test and xymongen both running every 60 seconds

Betsy Schwartz betsy.schwartz at gmail.com
Wed Oct 3 04:51:40 CEST 2012


We are running xymon 4.3.7 on a somewhat slow linux box (hope to
migrate this week).
I've experimented with running xymongen more frequently; xymon fell
over when I ran it every 20 seconds so it is currently running once a
minute.

We have  a custom test, GGAdmin,  that is ALSO running once a minute
and seeing somewhat frequent status changes.
Yesterday it was changing from red to green, when the team was paged
with  this alert message:

Sent:     Mon 10/1/2012 4:18 PM
Subject: Xymon [539804] db4.example.com:GGAdmin CRITICAL (RED)

message content included:
green Mon Oct  1 16:16:24 2012 <h3 style="color:cyan">Golden Gate
Status </h3>  &green All Golden Gate Monitors are Green
<snip>

So that's a red alert page, sent at 4:18 but the contents of the email
indicated green at 4:16.
I looked at the logs on the client, which are running verbosely.
Everything on the client side was correct - sent red status with red
content; green status with green content

I sent these messages to the xymon server:

/export/home/xymon/client/bin/bb 10.100.5.42 'status
db4.example.com.GGAdmin red Mon Oct  1 16:14:07 2012      <...snip...>
&red    GGPROC1   status is  STOPPED
/export/home/xymon/client/bin/bb 10.100.5.42 'status
db4.example.com.GGAdmin green Mon Oct  1 16:15:17 2012  <....snip...>
&green All Golden Gate Monitors are Green
/export/home/xymon/client/bin/bb 10.100.5.42 'status
db4example.com.GGAdmin green Mon Oct  1 16:16:24 2012   <....snip...>
&green All Golden Gate Monitors are Green

My theory is that we sent status messages so frequently that between
the time xymon saw the red dot and grabbed the contents for the
message, the color had changed, so the email content came from a later
message than the subject. Does this sound possible?

We're going to be moving to a faster server, and I cranked the test
frequency down to 5 minutes for now, but our new grandboss is pushing
us to get the alert time down as fast as possible (and making
unfavorable comparisons to Nagios...) so I'd like to get us running at
as fast a cycle speed as we can handle.

thanks for any thoughts



More information about the Xymon mailing list