[Xymon] Possible Bug in Xymon Related to Purple Statuses

Matt Vander Werf matt1299 at gmail.com
Thu Dec 3 18:43:15 CET 2015


Hello,

I am having an issue with Xymon where instead of tests going purple when
the client stops reporting, the tests are going clear.

I noticed this with a host that had all it's tests go clear instead of
purple. Turns out the network interface on the machine had completely died
and this had happened a week ago! We never noticed because instead of going
purple the tests for the machine went clear!

This seems to only be an issue with a certain group of machines. For this
group of machines, we have the ping test disabled by using the 'noping'
option on all of them. This is because they are all behind a firewall with
private IP addresses so they are unable to be contacted by the Xymon
server. But they can still send client data out to the Xymon server.

Turns out, ever since we started using the 'noping' option for all of them,
none of the machines have ever gone purple...

I tested this by stopping the xymon-client service on one of the machines
in question, and sure enough, after the STATUSLIFETIME time limit, all the
tests for that host went clear, instead of going purple.


I looked through the different logs (I already had most set in debug mode
for a different reason), and I didn't see much that would explain this (but
I could have missed something).

I did notice in the xymond log file that, according to xymond, they should
have been going purple and not clear.

Here's an excerpt from that log file (this is the machine which I stopped
the service on):

9680 2015-12-02 09:57:48.040111 -> check_purple_status
9680 2015-12-02 09:57:48.047630 Purple log from <HOST> memory
9680 2015-12-02 09:57:48.047674 ->handle_status
9680 2015-12-02 09:57:48.047676  modifyonly = 0, changed = 0
9680 2015-12-02 09:57:48.047680  - sum: 0, synced: 0, oldcolor: 0,
newcolor: 1, modifychanged: 0
9680 2015-12-02 09:57:48.047682 posting to stachg channel: host=<HOST>,
test=memory
9680 2015-12-02 09:57:48.047684 -> posttochannel
9680 2015-12-02 09:57:48.047697 Posting message 14359 to 1 readers
9680 2015-12-02 09:57:48.047703 <- posttochannel
9680 2015-12-02 09:57:48.047705 posting to status channel
9680 2015-12-02 09:57:48.047706 -> posttochannel
9680 2015-12-02 09:57:48.047712 Posting message 72429 to 2 readers
9680 2015-12-02 09:57:48.047726 <- posttochannel
9680 2015-12-02 09:57:48.047727 <-handle_status

Basically this showed up for all the different tests for this machine.

And here's the event log for the same machine:

Wed Dec 2 09:57:48 2015 <HOST>  cpu [image: green] [image: From -> To]
<https://mon.crc.nd.edu/xymon-cgi/historylog.sh?HOST=jim.vectorbase.org&SERVICE=cpu&TIMEBUF=1449068268>[image:
clear]
Wed Dec 2 09:57:48 2015 <HOST> disk [image: green] [image: From -> To]
<https://mon.crc.nd.edu/xymon-cgi/historylog.sh?HOST=jim.vectorbase.org&SERVICE=disk&TIMEBUF=1449068268>[image:
clear]
Wed Dec 2 09:57:48 2015 <HOST> inode [image: green] [image: From -> To]
<https://mon.crc.nd.edu/xymon-cgi/historylog.sh?HOST=jim.vectorbase.org&SERVICE=inode&TIMEBUF=1449068268>[image:
clear]
Wed Dec 2 09:57:48 2015 <HOST> memory [image: green] [image: From ->
To] [image:
clear]


Any thoughts as to what's going on? Looks like a bug to me...

Thanks!!

--
Matt Vander Werf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20151203/e5e9ae7d/attachment.html>


More information about the Xymon mailing list