[Xymon] dropping/making blue checks not persistent when restarting

Japheth Cleaver cleaver at terabithia.org
Mon May 22 18:46:39 CEST 2017


On 5/22/2017 1:55 AM, Sven Schuster wrote:
> Sorry, I should have been a bit more precise in this regard:
> - test disabled are disabled via enable/disable from the 
> Administration menu for some period of time, e.g. 2 hours, without 
> "until OK" checked. It doesn't matter if you're blueing out a green 
> (e.g. planned downtime) or red test. The problem remains the same.
> - the restart is done to make changes visible immediately for checking 
> the change after applying it
> - dropped tests are of checks (or hosts) which don't exist anymore, so 
> there won't be any checks coming in for the checks/hosts dropped
> Yes when waiting for some time before restarting after disabling or 
> dropping a check, that change will "survive" the restart. As pointed 
> out in Jeremy Laidman's post, this indeed seems to be due to the 
> checkpoint interval which is 600 seconds in the local configuration.
>
> Kind regards,
> Sven
> *Gesendet:* Freitag, 19. Mai 2017 um 16:02 Uhr
> *Von:* "Root, Paul T" <Paul.Root at CenturyLink.com>
> *An:* "'Sven Schuster'" <Schuster.Sven at gmx.de>, "xymon at xymon.com" 
> <xymon at xymon.com>
> *Betreff:* RE: [Xymon] dropping/making blue checks not persistent when 
> restarting
>
> So, there’s a couple things here.
>
> First, how are you disabling (bluing out) a test (you call check)? Are 
> you checking the “until OK” or are you providing a time limit for the 
> disable? Also, if the test is green why would you want it disabled?
>
> Second, why are you restarting xymon after a config change? All 
> configuration files are re-read (except local-client.cfg) every 5 minutes.
>
> Next, you say dropped tests reappear. Well of course. If the client is 
> providing the test to the server, the server is going to display it. 
> If you don’t want a test in xymon, it has to be disabled at the source.
>
> I don’t understand your second paragraph. You you are saying that you 
> disable a test and then wait 5-10 minutes and the disabled test will 
> remain blue after restarting xymon?
>
> *From:*Xymon [mailto:xymon-bounces at xymon.com] *On Behalf Of *Sven Schuster
> *Sent:* Friday, May 19, 2017 7:55 AM
> *To:* xymon at xymon.com
> *Subject:* [Xymon] dropping/making blue checks not persistent when 
> restarting
>
> Hello everybody,
>
> recently I've been seeing a strange issue on xymon server. When I make 
> a check blue and shortly after xymon gets restarted due to 
> configuration updates, that blue check will be green again afterwards. 
> The same thing happens when a check is dropped and xymon gets 
> restarted directly after that: the dropped check reappears.
>
> If you wait some amount of time before restarting, say 5-10 minutes, 
> the problem won't appear and everything will be fine. I also sync'ed 
> on the server directly after making a check blue and before restarting 
> (to avoid data not being written to disk for some strange reason), 
> which unfortunately did not help.
>
> Environment is xymon 4.3.27 on Debian jessie. Xymon has been updated 
> to 4.3.28 because of this problem lately, with the problem appearing 
> in 4.3.28, too. This server has just been upgraded from Debian wheezy 
> to Jessie a few weeks ago. On wheezy xymon 4.3.27 was in use but 
> didn't show this behaviour.
>
> Did anybody notice such an odd behaviour or maybe have any thoughts 
> regarding possible causes?
>
> Thanks in advance,
>
> Sven
>

Hi Sven,

This behavior would seem to point in the direction of the checkpoint 
file not being written out properly on shutdown, especially if it's 
working fine during the normal checkpointing process (eg, waiting 600 
seconds before the restart) and could be a latent bug (or at least a 
missing error message).

Can you set xymond to --debug mode (or send it  -USR2 signal) and then 
shutdown/restart the process after this change? If shutting down, you 
can take a quick poke at the checkpoint file to see that it's been 
updated at the moment of shutdown? Depending on the host in question, 
you can also search for the test that should "no longer be there" (it's 
just a simple text file format).

The same routine is called at shutdown as is called during the periodic 
interval checkpointing, except for the fact that we wait synchronously 
for it to complete -- precisely to avoid this type of concern, but that 
doesn't mean there isn't an issue there still.

Regards,

-jc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20170522/6156160d/attachment.html>


More information about the Xymon mailing list