[Xymon] best (or any) way to remember disabled tests on the main page?

John Thurston john.thurston at alaska.gov
Wed Jul 30 19:14:27 CEST 2014


On 7/30/2014 8:50 AM, oliver wrote:
>>> Ideally, I'd like to see the name of the server group ("prod" in the
>>> example) change to blue from white on the main view to remind me
>>> there's an ignored test.  But I don't want the "main view" colour to
>>> change from green
>>
>>
>> Don't disable the test. Acknowledge the alert.
>
> Let me explain the situation a little more clearly.
>
> We have tons of servers deployed in pairs.  Each pair consists of an
> active box and a standby box and it doesn't technically matter which
> one of the two is active.  For consistency reasons, we like to keep it
> so the "first" box is active whenever possible.
>
> If the first box fails over, for whatever reason, it generates a red
> alarm on Xymon saying it's no longer active and (after checking
> everything out) we ask someone on the night-shift to fail back over
> during off-hours.  At this point, we don't want the main Xymon view to
> be red so we "ignore" the test.  However, since the main view is now
> green, the techs sometimes forget that there's anything to do and it
> remains failed over until someone drills down and sees it.

This comes back around to something I regularly tell our staff:
"Xymon (and Big Brother before that) is not a task list. It is an 
alerting system. Using it as a task list is an abuse of the tool and 
reduces its ability to meets its fundamental business goal."

We have task-list and problem tracking processes in place so don't need 
to use Xymon to meet this need. Your business needs and available tools 
may be different, but I urge you to consider finding a better tool than 
Xymon for managing task lists.

> I was trying to get to a state where they would know that there's a
> disabled/ignored/ack'd box from the front page to eliminate the "I
> missed the email" excuses

You could define a 'combo' test which alarmed when fewer than two of the 
underlying tests were green. This 'combo' test could be rigged to 
propagate to the non-green screen while suppressing the propagation of 
the underlying tests.

You could then rig the underlying tests to send automated email alerts 
to the folks who should fix the broken half of the pair. Look at 
combo.cfg and alerts.cfg for options to aggregate test results and 
time/escalate automated email alerts.

-- 
    Do things because you should, not just because you can.

John Thurston    907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska



More information about the Xymon mailing list