[Xymon] Xymon Dependancies configuration.

me at tdiehl.org me at tdiehl.org
Sat Jun 6 21:36:07 CEST 2020


On Thu, 4 Jun 2020, Ralph M wrote:

> On Thu, Jun 4, 2020 at 3:36 PM <me at tdiehl.org> wrote:
>
>> Hi,
>>
>> On Thu, 4 Jun 2020, Adam Thorn wrote:
>>
>>> On 03/06/2020 22:49, me at tdiehl.org wrote:
>>>>  Hi,
>>>>
>>>>  I am trying to configure xymon dependencies so that if the core router
>> is
>>>>  down
>>>>  my xymon server only pages me for the core router.
>>>>
>>>>  In reading the man page it says to do something like the following:
>>>>
>>>>  1.2.3.4 cg1.example.com # noconn https://cg1.example.com
>>>>  depends=(http:router.example.com/conn)
>>>>
>>>>  The above works for a single service but the above host for example has
>>>>  http and sslcert. How can I tell xymon that if router.example.com is
>> down
>>>>  all
>>>>  of the other services for a host should go clear?
>>>>
>>>>  I tried setting the service to a * that does not work. and I tried
>> listing
>>>>  services separated with either a comma or a pipe but no joy.
>>>
>>> "man hosts.cfg" suggests that the syntax you want is
>>>
>>> depends=(testA:host1/test1,host2/test2),(testB:host3/test3)
>>>
>>> so for your example,
>>>
>>> depends=(http:router.example.com/conn),(sslcert:router.example.com/conn)
>>
>> That does not work for the sslcert test but does work for things like ssh.
>> Which now makes sense given the info below.
>>
>>>
>>> As the man page says, "depends" only applies to tests performed by
>> xymonnet.
>>> Wildcards do not appear to be supported but protocols.cfg will show you
>> most
>>> of the tests that xymonnet might perform.
>>
>> Ok, that explains why the neither the conn or sslcert test will not go
>> clear.
>> Neither test is listed in protocols.cfg. Given that both of these tests are
>> network type tests it seems odd that they cannot be made to go clear on
>> failure of another network test. I guess I do not really understand how
>> Xymon works.
>>
>> I was really hoping to be able to get a single alert when the router went
>> down. It does not happen real often but it is a pita to get several hundred
>> text messages for what is really a single failure.
>>
>> Does anyone have a solution for these kinds of failures?
>>
>
> You could write an external script to connect to the router and "do stuff"
> if the connection fails.
>
> For example, if you're checking the router every 5 minutes, when it fails
> you could send a "disable" message to Xymon for the list of things behind
> the router, with a 10 minute lifetime.  That'll turn off alerts for all
> those devices.  As long as the router continues to fail, keep on sending
> disables with 10 min lifetime, essentially extending the original
> lifetime.  Once the router recovers, the disable message will expire up to
> 10 mins later and those devices will alert or not depending on their next
> status.
>
> I don't have such a script, but it feels like it ought to be fairly trivial
> to implement.

In preparation for writing a script to do what I need, I have been playing with
xymon commands.

If I send the following to xymon it appears to be ignoring the lifetime parameter:
/usr/bin/xymon 127.0.0.1 "status+10m EMD1-2,example,com.conn clear `date` test message"

The above command will send a status message to xymon but is only stays clear for approx
30 seconds. If I am reading the man page correctly it should stay clear for 10 minutes.
Does anyone know what I am missing?

Regards,

-- 
Tom			me at tdiehl.org


More information about the Xymon mailing list