[Xymon] Spurious purple messages
Colin Coe
colin.coe at gmail.com
Tue Sep 15 05:29:20 CEST 2015
Hi Vernon,
Yep, very interesting. The purple messages come through every day at
about the same time, give or take a minute or so.
Yep, pings work and the normal "main view" and "all non-green view" works fine.
The logs look fine. I'd really like to get to the bottom of this...
Thanks
CC
On Tue, Sep 15, 2015 at 10:06 AM, Vernon Everett
<everett.vernon at gmail.com> wrote:
> That's interesting.
> No idea what it means, or where to go from here, but it's certainly
> interesting.
>
> Does it happen the exact same time every day?
> Have you tried a ping from the Xymon host to the client at or around the
> time of the issue? See if there's any oddities?
>
> Is there anything in the logs?
>
>
> On 14 September 2015 at 15:17, Colin Coe <colin.coe at gmail.com> wrote:
>>
>> OK, looking at this again. The main view looks fine, but the 'conn'
>> test on every host is a yellow circle with a question mark (unknown)
>> in the snapshot report view since September 4, 2015 at 13:32:42.
>>
>> September 4, 2015 at 13:32:41 and earlier look fine.
>>
>> Thanks
>>
>> On Sat, Sep 12, 2015 at 5:48 PM, Vernon Everett
>> <everett.vernon at gmail.com> wrote:
>> > Good to know it's not just me that fights with SELinux. :-)
>> >
>> > Now that it works, what does the snapshot report reveal at the time the
>> > purple alerts go out?
>> >
>> > Purples require a "no report" for 30 minutes to trigger.
>> > You might want to check all your logs at around 30-35 minutes before the
>> > emails.
>> >
>> >
>> >
>> >
>> > On 11 September 2015 at 18:13, Colin Coe <colin.coe at gmail.com> wrote:
>> >>
>> >> Almost...
>> >>
>> >> Turned out to be SELinux, my old nemesis. :)
>> >>
>> >>
>> >>
>> >> On Tue, Sep 8, 2015 at 5:37 PM, Vernon Everett
>> >> <everett.vernon at gmail.com>
>> >> wrote:
>> >> > That might be a permissions thing.
>> >> >
>> >> >
>> >> >
>> >> > On 8 September 2015 at 19:15, Colin Coe <colin.coe at gmail.com> wrote:
>> >> >>
>> >> >> Hi Vernon
>> >> >>
>> >> >> Thanks for the really good info. The message serial numbers are
>> >> >> different every day but the messages are sent at the same time
>> >> >> (13:45)
>> >> >> daily for all tests on all hosts.
>> >> >>
>> >> >> The network is not congested nor is the SAN under any kind of
>> >> >> pressure.
>> >> >>
>> >> >> Interestingly, trying to do the snapshot report gave me "Cannot
>> >> >> create
>> >> >> output directory".
>> >> >>
>> >> >> Thanks again
>> >> >>
>> >> >> CC
>> >> >>
>> >> >> On Tue, Sep 8, 2015 at 3:56 PM, Vernon Everett
>> >> >> <everett.vernon at gmail.com>
>> >> >> wrote:
>> >> >> > Hi Colin
>> >> >> >
>> >> >> > What do the client hosts share in common?
>> >> >> > I have seen in the past, a client was overloading their storage
>> >> >> > system,
>> >> >> > and
>> >> >> > were overflowing buffers and exceeding the storage array's ability
>> >> >> > to
>> >> >> > process IO requests. Of course this caused a general disk latency,
>> >> >> > which
>> >> >> > slowed things down to the point of a purple flood.
>> >> >> > Was no simple solution to that one, except buy more storage, which
>> >> >> > they
>> >> >> > did.
>> >> >> >
>> >> >> > Also, check the "serial numbers" on the messages. Is this a repeat
>> >> >> > of
>> >> >> > an
>> >> >> > older message - in which case Xymon might have something fishy
>> >> >> > going
>> >> >> > on,
>> >> >> > or
>> >> >> > are they new messages every day, as in it really thinks there is a
>> >> >> > problem.
>> >> >> >
>> >> >> > Xymon only updates pages every 2 and 5 minutes, depending on the
>> >> >> > page
>> >> >> > you
>> >> >> > are looking at. Meaning you could wait up to 7 minutes for the
>> >> >> > real
>> >> >> > status
>> >> >> > to appear.
>> >> >> > A purple takes 30 minutes to trigger.
>> >> >> > With some unfortunate, and highly improbable timing on whatever is
>> >> >> > triggering these events, it's possible you might not see the
>> >> >> > purple.
>> >> >> > Have you pulled up a "snapshot report" for the exact time of the
>> >> >> > messages?
>> >> >> >
>> >> >> > Something else unlikely, but possible, is the network.
>> >> >> > The conn test used ping, which is UDP
>> >> >> > The Xymon agent sends using TCP.
>> >> >> > Is there anything interesting happening on the network at the
>> >> >> > time?
>> >> >> >
>> >> >> > Regards
>> >> >> > Vernon
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On 8 September 2015 at 11:39, Colin Coe <colin.coe at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi all
>> >> >> >>
>> >> >> >> Since Friday September 4, I've started receiving "stopped
>> >> >> >> reporting
>> >> >> >> (PURPLE)" messages for all tests on all hosts from one of our
>> >> >> >> Xymon
>> >> >> >> servers.
>> >> >> >>
>> >> >> >> The host status, as shown in the Main View, is green for all
>> >> >> >> hosts
>> >> >> >> and
>> >> >> >> tests. No purple at all.
>> >> >> >>
>> >> >> >> The "stopped reporting (PURPLE)" messages are being sent at the
>> >> >> >> same
>> >> >> >> time every day, 1:45PM.
>> >> >> >>
>> >> >> >> Any advise on how I should track this down?
>> >> >> >>
>> >> >> >> Thanks
>> >> >> >> _______________________________________________
>> >> >> >> Xymon mailing list
>> >> >> >> Xymon at xymon.com
>> >> >> >> http://lists.xymon.com/mailman/listinfo/xymon
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > "Accept the challenges so that you can feel the exhilaration of
>> >> >> > victory"
>> >> >> > - General George Patton
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > "Accept the challenges so that you can feel the exhilaration of
>> >> > victory"
>> >> > - General George Patton
>> >
>> >
>> >
>> >
>> > --
>> > "Accept the challenges so that you can feel the exhilaration of victory"
>> > - General George Patton
>
>
>
>
> --
> "Accept the challenges so that you can feel the exhilaration of victory"
> - General George Patton
More information about the Xymon
mailing list