TIL: DNS Search Domains (Day 31)

TIL: DNS Search Domains (Day 31)
Photo by Ashraful Islam / Unsplash

What Are Search Domains?

Search domains are DNS suffixes automatically appended to unqualified hostnames to help resolve local network resources. When you type server1instead of server1.home.network, your system will try both.

The Problem

When combined with wildcard DNS records (*.domain.tld), search domains can cause external domains to incorrectly resolve to internal IPs.

I needed internal pods in my clusters to resolve dns using my self hosted DNS resolver which is adguard home.

After different attempts I settled for modifying core dns and having it use adguard for certain domains:

forward . 192.168.50.120

This worked initially however I immediately noticed argo was "broken" everything was stuck in unknown, then an error came up it couldn't resolve github (somehow github was being resolved to my loadbalancer which isn't right)

# from argo 
Failed to load target state: failed to generate manifest for source 1 of 1: rpc error: 
code = Unknown desc = failed to list refs: dial tcp .....: connect: connection timed out

Some Diagnostics

Test DNS Resolution

k run dnstest --image=nicolaka/netshoot -it --rm --restart=Never -- nslookup github.com/mrdvince
Server:        10.96.0.10
Address:    10.96.0.10#53
Non-authoritative answer:
Name:    github.com/mrdvince.home.mrdvince.me 
Address: 192.168.50.10

Notice the github.com/mrdvince.home.mrdvince.me so I go "what even is this ? how did this come to be?"

I try the same thing but on the k8s node which resolved just fine:

nslookup github.com
Server:        192.168.50.120
Address:    192.168.50.120#53
Non-authoritative answer:
Name:    github.com
Address: 140.82.121.4

the next thing is to check what the resolv.conf contains:

k run dnstest --image=nicolaka/netshoot -it --rm --restart=Never -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local home.mrdvince.me
nameserver 10.96.0.10
options ndots:5
pod "dnstest" deleted

It turns out that due to the ndots: 5 domains with fewer than 5 dots are considered unqualified and trigger search domain appending.

And for the unqualified domains, Kubernetes tries all search domains including home.mrdvince.me the issue then becomes since I have a wildcard for this mapping the home DNS resolves to an internally set IP.

So where is this search domain coming from?. Turns it's coming from OPNsense and there's no way to disable it (The default is to use the domain name of this system as the default domain name provided by DHCP, but you can specify a different one, however you can't fully get rid of it)

Solutions

  • Use specific DNS records instead of wildcards
  • Prioritize external DNS servers (e.g., forward . 1.1.1.1 local_dns_server)
  • Use more specific wildcard patterns
  • Change the system domain to something non-conflicting

The eventual fix while I redo dns records and replace the record additions with external dns was to prioritize external dns and fallback to internal one.

The quick dirty fix:

.:53 {
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    # Prioritize external DNS to avoid search domain problems
    forward . 1.1.1.1 192.168.50.120
    cache 30
    loop
    reload
    loadbalance
}

So looks the meme was right, it's always dns after all.