Feb 26
Automated Split DNS for a Kubernetes cluster
by Ryan Welch · 8 Min Read
I self-host a number of services and want them to have clean domain names that work both inside and outside my network. On the LAN, requests should stay local and resolve to the internal load balancer. Away from home, the same hostnames should resolve publicly and route back through a VPS. I also want DNS records managed automatically from Kubernetes, with control over which services are internal-only, external, or both.
The approach I’ve settled on uses CoreDNS, etcd, and ExternalDNS for the internal side, with a wildcard DNS zone and a Tailscale-connected VPS for external access. The external tunnel setup is covered in another article, but this will focus on the DNS side.
Before getting started, you’ll need a Kubernetes cluster with MetalLB (or another bare-metal load balancer) to assign IPs to your ingress services and a domain you control. In the examples below I am using Helm and Flux for managing deployments, but this is optional. For the external side, you’ll need a way of routing external traffic to your cluster, as mentioned I’m using a VPN tunnel which I covered in another article.
Split DNS returns different IP addresses for the same domain depending on where the query comes from. Internal queries return the cluster’s MetalLB load balancer IP. External queries resolve via a public wildcard DNS zone pointing to a VPS, which forwards traffic through Tailscale to the nginx ingress in the cluster.
This keeps internal traffic local and avoids unnecessary internet round trips, while the same hostnames work everywhere.
Three components work together to automatically manage internal DNS.
CoreDNS handles DNS resolution with two data sources: the etcd plugin for dynamic records from Kubernetes and static zone files as a fallback for things like printers and NAS devices. Queries are answered from whatever’s currently in etcd, no restarts or cache invalidation needed.
etcd is a distributed key-value store that acts as the DNS database. Records are stored under /skydns, updated by ExternalDNS, and queried by CoreDNS in real-time.
ExternalDNS watches Kubernetes resources (Ingress, Service, DNSEndpoint) and automatically publishes DNS records to etcd based on annotations.
External access uses a simpler, manually configured setup. A wildcard DNS zone (e.g., *.example.com) at your DNS provider points to the public IP of a VPS. The VPS runs Tailscale and forwards incoming traffic to the external nginx ingress in the cluster. The Tailscale operator exposes that ingress with a fixed Tailscale IP, so the VPS always knows where to send traffic.
Using a wildcard record has a privacy benefit: it avoids leaking which subdomains actually exist. Individual DNS records for each service would expose the names and number of services you’re running.
This only needs to be set up once. For details on the VPS and Tailscale forwarding, see my other article here.
Note
The external DNS configuration is currently manual. I plan to automate this in future so that external DNS records are also managed from Kubernetes, similar to how the internal side works today.
The cluster runs two separate nginx ingress instances: one for internal traffic and one for external. Each has its own ingress class, so you control where a service is reachable at the manifest level.
internal-nginx routes traffic from the LAN via the internal MetalLB IP. nginx (external) routes traffic arriving from the VPS via Tailscale, reachable from the internet through the wildcard DNS zone.
To set this up, deploy two separate Helm releases of the ingress-nginx chart, each with a different ingressClass name and its own MetalLB service IP. The internal instance gets a LAN-routable IP, and the external instance gets a separate IP that the Tailscale operator exposes.
ExternalDNS watches both ingress classes and publishes records for all services to etcd, so internal clients can resolve hostnames for services on either ingress. However, the external wildcard DNS only forwards traffic to the external nginx ingress, so services using only internal-nginx are unreachable from outside the network. The external boundary is enforced by the architecture itself rather than just DNS.
A service can use one or both ingress classes. Something like a personal dashboard might only use internal-nginx, making it LAN-only. A service you want accessible remotely would use nginx or both.
Here’s a real-world example using Home Assistant. The main web UI is exposed externally so it’s accessible from anywhere, while a code-server sidecar for editing configuration files is kept internal-only:
ingress: main: className: nginx # external ingress annotations: cert-manager.io/cluster-issuer: letsencrypt-prod hosts: - host: home.example.com paths: - path: / pathType: Prefix service: identifier: main port: http code: className: internal-nginx # internal only annotations: cert-manager.io/cluster-issuer: letsencrypt-prod hosts: - host: ha-config.example.internal paths: - path: / pathType: Prefix service: identifier: code port: httpThe className is all that determines visibility. home.example.com uses the nginx ingress class, so it’s reachable both internally (via CoreDNS) and externally (via the wildcard DNS and VPS). ha-config.example.internal uses internal-nginx, so it’s only resolvable on the LAN.
ExternalDNS picks up both hostnames and publishes them to etcd automatically. No manual DNS changes needed when deploying or updating the service.
You’ll need a simple etcd deployment, a single node is fine for this. The main thing is that it’s reachable from both ExternalDNS (to write records) and CoreDNS (to read them). If you don’t already have etcd running in your cluster, the etcd-io Helm chart ↗ or a simple StatefulSet will do the job. Just note the service IP or endpoint, since both ExternalDNS and CoreDNS need it.
Deploy and configure ExternalDNS to handle the internal DNS side. Here’s an example HelmRelease configuration using Flux:
apiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: &app external-dns-etcd namespace: networkingspec: interval: 15m chart: spec: chart: external-dns version: 1.20.0 sourceRef: kind: HelmRepository name: external-dns namespace: flux-system maxHistory: 2 install: createNamespace: true remediation: retries: 3 upgrade: cleanupOnFail: true remediation: retries: 3 uninstall: keepHistory: false values: fullnameOverride: *app sources: ["ingress", "crd", "service"] provider: name: coredns env: - name: ETCD_URLS value: "http://10.0.0.10:2379" # etcd IP extraArgs: - --crd-source-apiversion=externaldns.k8s.io/v1alpha1 - --crd-source-kind=DNSEndpoint - --ignore-ingress-tls-spec - --ingress-class=internal-nginx - --ingress-class=nginx policy: upsert-only txtOwnerId: "internal-dns-coredns" domainFilters: - example.internal - example.comThe important bits: sources tells ExternalDNS to watch Ingress, Service, and DNSEndpoint resources. The provider is set to CoreDNS which uses etcd as its backend. The upsert-only policy prevents accidental deletions, and domainFilters restricts which domains ExternalDNS will manage. Both ingress classes are listed so records are published for internal and external services alike.
Refer to the ExternalDNS documentation ↗ for more detail, but this should work for most setups with etcd and ingresses in your cluster.
The CoreDNS Corefile configuration:
.:53 { cache 3600 forward . 1.1.1.1 8.8.8.8 { policy random } log errors whoami}
example.internal:53 { etcd { path /skydns endpoint http://10.0.0.10:2379 fallthrough } forward . 127.0.0.1:5553 log errors}
example.internal:5553 { file /etc/coredns/zones/db.example.internal log errors}
example.com:53 { etcd { path /skydns endpoint http://10.0.0.10:2379 fallthrough } log errors}The root zone (:53) forwards any unmatched queries to public DNS, so CoreDNS acts as a full resolver for your LAN. The example.internal zone checks etcd first for dynamic records, and if nothing matches, the fallthrough directive passes the query to a second zone on port 5553 that serves a static zone file. You can’t use both a zone file and the etcd plugin in the same block, which is why the static records are served on a separate port and forwarded to. The example.com zone just queries etcd for any dynamic records ExternalDNS has published.
The static zone file (db.example.internal) is for devices that aren’t managed by Kubernetes but still need DNS entries on the LAN, though you may choose to manage these entries elsewhere:
$ORIGIN example.internal.$TTL 1h
@ IN SOA ns.example.internal. admin.example.internal. ( 2020010511 ; Serial 1d ; Refresh 2h ; Retry 4w ; Expire 1h) ; Minimum TTL
@ IN A 10.0.100.0@ IN NS ns.example.internal.
ns.example.internal. IN A 10.0.0.1nas.example.internal. IN A 10.0.0.2Update the IPs and hostnames to match your network. The serial number should be incremented whenever you change the file, CoreDNS uses it to detect updates.
Finally, update your router’s DHCP settings to hand out the CoreDNS IP as the DNS server for clients on your LAN. Make sure your clients aren’t overriding the DHCP-provided DNS (some devices and browsers do this by default with DNS-over-HTTPS).
Once everything is deployed, you can verify the internal side with dig:
# Query CoreDNS directly for a dynamic recorddig home.example.com @<coredns-ip>
# Should return the internal MetalLB IP, e.g. 10.0.100.x
# Query a static record from the zone filedig nas.example.internal @<coredns-ip>
# Should return 10.0.0.2You can also check what ExternalDNS has written to etcd directly:
etcdctl get /skydns --prefix --keys-onlyIf records are showing up in etcd but not resolving through CoreDNS, double-check that the etcd endpoint is correct in your Corefile and that the fallthrough directive is present.
This setup gives me automatic internal DNS for anything running in the cluster, with split resolution so the same hostnames work on the LAN and over the internet. The internal side is fully automated: deploy a service with the right ingress class and the DNS entries are created. The external side uses a wildcard record pointing at a VPS, which keeps things simple and avoids leaking subdomain names.
There are a few things I’d like to improve in future: automating the external DNS so it’s managed from Kubernetes the same way the internal side is, making the etcd endpoint discoverable rather than hardcoded, and migrating from Ingress to Gateway API routes. I’ll cover those in follow-up posts.
Last edited Apr 12