[Xymon] Spurious purple messages

Colin Coe colin.coe at gmail.com
Sat Sep 19 08:47:53 CEST 2015


Hi all

I ended up resolving this by stopping the Xymon service, removing all
files in $XYMONTMP and then starting xymon again.

Thanks all for the suggestions

CC

On Thu, Sep 17, 2015 at 6:28 AM, Colin Coe <colin.coe at gmail.com> wrote:
> Glauber, I can confirm there are no cron jobs or similar that alter the time.
>
> Phil, I can confirm that it is a false positive.
>
> I figure there must be some stale data somewhere but I've not found
> it.   What process sends the notifications?  Where does this process
> get its data?
>
> Thanks all
>
> On Wed, Sep 16, 2015 at 10:01 PM, Ribeiro, Glauber
> <glauber.ribeiro at experian.com> wrote:
>> Sorry, I wasn't clear. I was wondering if there could be some process set up in cron to adjust the time, which could be causing this (bumping the server time once a day). Just hypothetical, unlikely.
>>
>> g
>>
>> -----Original Message-----
>> From: Colin Coe [mailto:colin.coe at gmail.com]
>> Sent: Wednesday, September 16, 2015 01:26
>> To: Ribeiro, Glauber
>> Cc: Vernon Everett; xymon at xymon.com
>> Subject: Re: [Xymon] Spurious purple messages
>>
>> Hi all
>>
>> The date/time is set correctly:
>> ---
>> timedatectl
>>       Local time: Wed 2015-09-16 14:23:45 AWST
>>   Universal time: Wed 2015-09-16 06:23:45 UTC
>>         RTC time: Wed 2015-09-16 06:23:42
>>         Timezone: Australia/Perth (AWST, +0800)
>>      NTP enabled: yes
>> NTP synchronized: yes
>>  RTC in local TZ: no
>>       DST active: n/a
>> ---
>>
>> fping responds with "host is alive", ping responds with "normal" ping
>> successful output.
>>
>>
>> Anyone else have any ideas on this, I really don't want to have to
>> blow this server away and start again...
>>
>> Thanks
>>
>> On Tue, Sep 15, 2015 at 11:44 PM, Ribeiro, Glauber
>> <glauber.ribeiro at experian.com> wrote:
>>> Could it be something with the clock on the xymon server? Maybe some cron process to synchronize to a time server?
>>>
>>> -----Original Message-----
>>> From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Colin Coe
>>> Sent: Monday, September 14, 2015 22:29
>>> To: Vernon Everett
>>> Cc: xymon at xymon.com
>>> Subject: Re: [Xymon] Spurious purple messages
>>>
>>> Hi Vernon,
>>>
>>> Yep, very interesting.  The purple messages come through every day at
>>> about the same time, give or take a minute or so.
>>>
>>> Yep, pings work and the normal "main view" and "all non-green view" works fine.
>>>
>>> The logs look fine.  I'd really like to get to the bottom of this...
>>>
>>> Thanks
>>>
>>> CC
>>>
>>> On Tue, Sep 15, 2015 at 10:06 AM, Vernon Everett
>>> <everett.vernon at gmail.com> wrote:
>>>> That's interesting.
>>>> No idea what it means, or where to go from here, but it's certainly
>>>> interesting.
>>>>
>>>> Does it happen the exact same time every day?
>>>> Have you tried a ping from the Xymon host to the client at or around the
>>>> time of the issue? See if there's any oddities?
>>>>
>>>> Is there anything in the logs?
>>>>
>>>>
>>>> On 14 September 2015 at 15:17, Colin Coe <colin.coe at gmail.com> wrote:
>>>>>
>>>>> OK, looking at this again.  The main view looks fine, but the 'conn'
>>>>> test on every host is a yellow circle with a question mark (unknown)
>>>>> in the snapshot report view since September 4, 2015 at 13:32:42.
>>>>>
>>>>> September 4, 2015 at 13:32:41 and earlier look fine.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Sat, Sep 12, 2015 at 5:48 PM, Vernon Everett
>>>>> <everett.vernon at gmail.com> wrote:
>>>>> > Good to know it's not just me that fights with SELinux. :-)
>>>>> >
>>>>> > Now that it works, what does the snapshot report reveal at the time the
>>>>> > purple alerts go out?
>>>>> >
>>>>> > Purples require a "no report" for 30 minutes to trigger.
>>>>> > You might want to check all your logs at around 30-35 minutes before the
>>>>> > emails.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 11 September 2015 at 18:13, Colin Coe <colin.coe at gmail.com> wrote:
>>>>> >>
>>>>> >> Almost...
>>>>> >>
>>>>> >> Turned out to be SELinux, my old nemesis.  :)
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Sep 8, 2015 at 5:37 PM, Vernon Everett
>>>>> >> <everett.vernon at gmail.com>
>>>>> >> wrote:
>>>>> >> > That might be a permissions thing.
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On 8 September 2015 at 19:15, Colin Coe <colin.coe at gmail.com> wrote:
>>>>> >> >>
>>>>> >> >> Hi Vernon
>>>>> >> >>
>>>>> >> >> Thanks for the really good info.  The message serial numbers are
>>>>> >> >> different every day but the messages are sent at the same time
>>>>> >> >> (13:45)
>>>>> >> >> daily for all tests on all hosts.
>>>>> >> >>
>>>>> >> >> The network is not congested nor is the SAN under any kind of
>>>>> >> >> pressure.
>>>>> >> >>
>>>>> >> >> Interestingly, trying to do the snapshot report gave me "Cannot
>>>>> >> >> create
>>>>> >> >> output directory".
>>>>> >> >>
>>>>> >> >> Thanks again
>>>>> >> >>
>>>>> >> >> CC
>>>>> >> >>
>>>>> >> >> On Tue, Sep 8, 2015 at 3:56 PM, Vernon Everett
>>>>> >> >> <everett.vernon at gmail.com>
>>>>> >> >> wrote:
>>>>> >> >> > Hi Colin
>>>>> >> >> >
>>>>> >> >> > What do the client hosts share in common?
>>>>> >> >> > I have seen in the past, a client was overloading their storage
>>>>> >> >> > system,
>>>>> >> >> > and
>>>>> >> >> > were overflowing buffers and exceeding the storage array's ability
>>>>> >> >> > to
>>>>> >> >> > process IO requests. Of course this caused a general disk latency,
>>>>> >> >> > which
>>>>> >> >> > slowed things down to the point of a purple flood.
>>>>> >> >> > Was no simple solution to that one, except buy more storage, which
>>>>> >> >> > they
>>>>> >> >> > did.
>>>>> >> >> >
>>>>> >> >> > Also, check the "serial numbers" on the messages. Is this a repeat
>>>>> >> >> > of
>>>>> >> >> > an
>>>>> >> >> > older message - in which case Xymon might have something fishy
>>>>> >> >> > going
>>>>> >> >> > on,
>>>>> >> >> > or
>>>>> >> >> > are they new messages every day, as in it really thinks there is a
>>>>> >> >> > problem.
>>>>> >> >> >
>>>>> >> >> > Xymon only updates pages every 2 and 5 minutes, depending on the
>>>>> >> >> > page
>>>>> >> >> > you
>>>>> >> >> > are looking at. Meaning you could wait up to 7 minutes for the
>>>>> >> >> > real
>>>>> >> >> > status
>>>>> >> >> > to appear.
>>>>> >> >> > A purple takes 30 minutes to trigger.
>>>>> >> >> > With some unfortunate, and highly improbable timing on whatever is
>>>>> >> >> > triggering these events, it's possible you might not see the
>>>>> >> >> > purple.
>>>>> >> >> > Have you pulled up a "snapshot report" for the exact time of the
>>>>> >> >> > messages?
>>>>> >> >> >
>>>>> >> >> > Something else unlikely, but possible, is the network.
>>>>> >> >> > The conn test used ping, which is UDP
>>>>> >> >> > The Xymon agent sends using TCP.
>>>>> >> >> > Is there anything interesting happening on the network at the
>>>>> >> >> > time?
>>>>> >> >> >
>>>>> >> >> > Regards
>>>>> >> >> > Vernon
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > On 8 September 2015 at 11:39, Colin Coe <colin.coe at gmail.com>
>>>>> >> >> > wrote:
>>>>> >> >> >>
>>>>> >> >> >> Hi all
>>>>> >> >> >>
>>>>> >> >> >> Since Friday September 4, I've started receiving "stopped
>>>>> >> >> >> reporting
>>>>> >> >> >> (PURPLE)" messages for all tests on all hosts from one of our
>>>>> >> >> >> Xymon
>>>>> >> >> >> servers.
>>>>> >> >> >>
>>>>> >> >> >> The host status, as shown in the Main View, is green for all
>>>>> >> >> >> hosts
>>>>> >> >> >> and
>>>>> >> >> >> tests.  No purple at all.
>>>>> >> >> >>
>>>>> >> >> >> The "stopped reporting (PURPLE)" messages are being sent at the
>>>>> >> >> >> same
>>>>> >> >> >> time every day, 1:45PM.
>>>>> >> >> >>
>>>>> >> >> >> Any advise on how I should track this down?
>>>>> >> >> >>
>>>>> >> >> >> Thanks
>>>>> >> >> >> _______________________________________________
>>>>> >> >> >> Xymon mailing list
>>>>> >> >> >> Xymon at xymon.com
>>>>> >> >> >> http://lists.xymon.com/mailman/listinfo/xymon
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > --
>>>>> >> >> > "Accept the challenges so that you can feel the exhilaration of
>>>>> >> >> > victory"
>>>>> >> >> > - General George Patton
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > "Accept the challenges so that you can feel the exhilaration of
>>>>> >> > victory"
>>>>> >> > - General George Patton
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > "Accept the challenges so that you can feel the exhilaration of victory"
>>>>> > - General George Patton
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "Accept the challenges so that you can feel the exhilaration of victory"
>>>> - General George Patton
>>> _______________________________________________
>>> Xymon mailing list
>>> Xymon at xymon.com
>>> http://lists.xymon.com/mailman/listinfo/xymon



More information about the Xymon mailing list