[Xymon] Modernizing the DNS check

Wed Jan 8 03:44:46 CET 2014

On Jan 7, 2014, at 19:23, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:

> Mark
> 
> I think more DNS checks would be really useful for many, but I would say that we'd be going down a rabbit hole chasing this.  The DNS check you've described is worthwhile to do for many people (myself included), but is only one of many that would need to be done to ensure that a name or domain is resolvable.
> 
> For example, should the same checks be done for the parent zone(s)?

Why are you monitoring something so far out of your control? I don't monitor the ROOT servers; I trust my friends at Verisign & co to handle that for me.

>  Should we check the WHOIS record for impending zone expiry date?

No, and doing so is ignorant; you can easily get banned from WHOIS lookups for abusing it. Use the registrar's APIs.

>  Should we check that there is more than one NS record?  Should we check that the NS records don't all point at the same IP addresses?

The question here is "Are the publicly accessible NS servers in a consistent functional state?". The goal is not to validate the data.

>  For high-turn-over (eg dynamic) zones, the masters nameservers might only rarely be in sync,

If you expect it to rarely be in sync why would you try to monitor for that?

> or the serial number might typically change before all of the SOA lookups are complete.

Of course you'd expect the race condition where the check happens while a change is happening. Waiting for another check is a reasonable way to avoid a false positive.

>  What about when there's a stealth master that can't be queried?

I'm not monitoring from an untrusted network; I own these NS servers and can certainly get to my stealth master from my monitoring infrastructure. Also, the theme is "Are the publicly accessible NS servers in a consistent functional state?"

>  What about reporting on slave zones that about to expire?

I could see that as useful, but when the query starts failing it will go red. This would be really easy to do though...

>  Or zones that have semantic errors such as MX records that refer to CNAME records, or host records with underlines, or CNAME loops?

Again, we're not validating the data just making sure it can be served correctly which mostly amounts to no errors and the serials not being out of whack. This isn't the proper place for those kinds of checks.

>  Should we be checking DNSSEC signatures?

No. I wouldn't trust Xymon's implementation of that anyway; that's best handled by your OS's DNS stack. The check will fail if the signature is incorrect because the entire lookup will fail.

> 
> Hmm, that list turned into a bit of a rant, really.  Sorry.  You can probably guess that I think about this stuff a fair bit, and many of the things I've listed are more "niche" than others, but still.

I'd say most are niche :-/

> 
> For each possible test anyone might want to include, each installation might need different ways of reporting and/or recording statistics, and so it would get complex very quickly.  Do you report a yellow if only 3 out of 4 NS servers are the same, or 7 out of 8?

If any NS are not at the same serial there should be concern. You have no control over which NS the client chooses. (side note: 7 NS is the max recommended by RFC 1912 anyway)

>  If the master's serial number somehow goes backwards, do we show seven servers wrong or is it just one?

Alert will happen because they're not in sync anyway. This is a problem for a human familiar with the environment to figure out once they've been informed.

>  You you assume that the master is in the MNAME field, or would you get the option to override?

"Are the publicly accessible NS servers in a consistent functional state?"

>  If two hosts have different values for the MNAME field, which do you consider master?  Or in this case, do you care?

How is this even happening? This is not a multi-master infrastructure. If the MNAME is different the serial most certainly is as well or you've picked up axfer errors in the logs, etc.

>  Also, which host(s) would you report the status against?
>  Do you have to create hosts.cfg entries for every NS, and then maintain that list by tracking the NS records as they change over time, or do you create a pseudo-host for each domain, or some of each?

I don't care. I'd probably end up doing "127.0.0.1 foo.com # noping fancydnscheck"

The error is telling me there's something wrong with the infrastructure and most likely will tell you which NS is the problem. I'm not interested in tying the event to a specific NS server hosts.cfg entry in Xymon because it's possible that there isn't one.

> 
> Woops, there I go ranting again, sorry.
> 
> Such complexity and flexibility is better implemented outside Xymon, to keep the Xymon core as simple and easy to maintain as possible.
> 
> I think the best solution is for each installer to decide on their own detection and reporting requirements, and create or install ext scripts to suit each case.  In fact, I'm surprised there aren't any on Xymonton.org already, but that's where I would expect such code to reside.  I'd be happy to assist with developing ext scripts for enhanced DNS checks.
> 
> J
> 
> 
> 
> On 8 January 2014 07:56, Mark Felder <feld at feld.me> wrote:
> Is there any hope of enhancing the DNS check capability beyond its
> current functionality? It would be nice if it could detect all the NS
> for the domain you're monitoring to compare the SOA serial of all the NS
> servers and go red if they're not in sync.
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
>