Inconsistent local DNS with systemd-resolved

,

Hello! Long time Debian user trying out Fedora for the first time, made the switch over the weekend. Everything is working great except for this local DNS issue I can’t quite figure out. Maybe I am just not understanding the process involved with systemd-resolved.

I have an internal DNS server running Pi-Hole (172.21.11.11), which has a number of local DNS entries for some virtual servers in a homelab. The entries are using an internal-only subdomain on a public domain that I own through Cloudflare (let’s just call it internal.mydomain.com, which is obviously not the real domain). I have also added these entries to my pfSense firewall (172.21.1.1), which acts as a backup DNS. The DNS server addresses are handed out by DHCP.

All the entries I have manually added resolve correctly, though some of them would only work after adding them to the pfSense box. Either way, these entries work now.

However, let’s say I spin up a new VM for testing, and I don’t want to add a DNS entry. It’s a test SQL server, so we’ll call it testsql. If I run ping testsql from Windows, MacOS, or any of my other Debian servers, it resolves to the correct IP. It does not work on Fedora.

dig testsql shows it timeout querying 127.0.0.53, which I understand to be the system querying its own cache, and resolvectl testsql shows testsql: 'testsql' not found.

But if I run dig testsql @172.21.11.11, the query returns successfully. I can also see the query in the Pi-Hole logs.

Curiously, if I run dig testsql.internal.mydomain.com, it shows that it is checking the Cloudflare nameservers, but does not return a valid IP.

After an indeterminate amount of time, pinging testsql will eventually show PING testsql.internal.mydomain.com (172.21.100.45) indicating it is resolving and also using the DNS domain. dig still times out even after this starts resolving successfully.

In summary: my hard-coded entries in Pi-Hole / pfSense work with no issues. Hostnames without hard coded entries can be resolved by every other system other than my Fedora system, until it eventually starts working for no apparent reason. Trying to figure out why internal DNS is resolving inconsistently.

Hmm. It worked this time after running those commands.

~$ resolvectl flush-caches
~$ resolvectl reset-server-features
~$ resolvectl query testsql.internal.mydomain.com
testsql.internal.mydomain.com: 172.21.100.45    -- link: enp34s0

-- Information acquired via protocol DNS in 2.2ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
~$ resolvectl query testsql                                                                                                                                  
testsql: 172.21.100.45                         -- link: enp34s0
         (testsql.internal.mydomain.com)

-- Information acquired via protocol DNS in 2.3ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
~$ resolvectl status --no-pager
Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 2 (enp34s0)
    Current Scopes: DNS LLMNR/IPv4
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 172.21.11.11
       DNS Servers: 172.21.11.11 172.21.1.1
        DNS Domain: internal.mydomain.com
     Default Route: yes

dig also correctly resolved an IP address for testsql.internal.mydomain.com, which it did not when I was testing this before. (Simply using dig testsql still does not, but this is fine).

If you specify two DNS servers, it is undefined which server would be queried for a given name look-up. You should then not consider one of them to be a back-up of the other, but some kind of load sharing. That also implies that the local server you create should also forward request to some public server for non-local names.

For a local single network, there is also LLMNR and mDNS which can resolve names without the accessing any DNS server. LLMNR should work when looking up names without any domain suffix, and mDNS works with “.local” suffix.

1 Like

Both DNS servers are forwarding external DNS requests to public servers so that is not a problem.

My other non-Linux machines generally treat 172.21.11.11 as the primary DNS and 172.21.1.1 as a secondary backup, verified by looking at query logs between the two. This is actually how I want it, and was under the impression that this is how it works, but apparently this is not the case, so thank you for that explanation.

It’s not the end of the world if I have to type the fully qualified name for servers. I’m mostly just confused as to why the behavior seems different. I was under the impression that the dns domain setting (which is being handed out via DHCP) should allow me to, for example, run ping testsql and for it to know that what I am actually asking for is to ping testsql.internal.mydomain.com. Maybe I am wrong about that (wouldn’t be the first time), but this is how it is working in MacOS for me. But the big difference is that the output of dig testsql on MacOS resolves the IP, while on Fedora it times out. I can also manually edit resolv.conf to point directly at the DNS server rather than the local cache and the dig command on Fedora also resolves without issue.

My assumption, therefore, was that something was broken with systemd-resolved. If nothing else, I was very confused at first as to why dig was querying 127.0.0.53 instead of one of my actual DNS servers. An entire evening of prodding at this didn’t really give me enough answers.

But based on these responses it seems like systemd-resolved is more or less behaving as expected, I was just misunderstanding how it works, would that be fair to say?

If you ping testsql then it might first try LLMNR if if nothing respondes, which can take quite a while, then it would continue to apply the domain names. You could try to disable LLMNR by creating a new file named /etc/systemd/resolved.conf.d/disable-llmnr.conf with the following contents

[Resolve]
LLMNR=no

This is systemd’s misfeature of offering LLMNR responses on standard DNS port 53, which dig is using. Note dig does not by default append search domain from /etc/resolv.conf. You would have to use dig +search testsql. But as Villy already identified, systemd-resolved does not even send testsql name to DNS servers, because it tries only multicast resolution on names without dot instead. LLMNR=no or ResolveUnicastSingleLabel=yes in resolved.conf should allow it. If you use Apple devices, you probably want to use mDNS instead of LLMNR (windows usually) anyway.

Use host testsql instead of dig to get domain appended by default. But dig is in general a better tool, just nor appending your domain by default. Use the full name tested instead. Appending domain is done only by the glibc library resolver code or emulated in (some) dns tools, not in dig.

Querying local cache at address 127.0.0.53 is okay and desired, it uses cache on your system to make repeated queries much faster. The problem is with LLMNR implementation and that systemd people consider it working well, although it has confusing behaviour you have seen.

It would be better if you could forward your internal domain from pfSense to Pi-Hole, having your internal data just once, but from all servers offered the same. Unbound can certainly do that, dnsmasq too. Not sure what is offered by the UI offered. Found dnsmasq advanced options, which in dnsmasq config syntax would look like server=/internal.mydomain.com/172.21.11.11. That sends internal.mydomain.com and all its children names to server 172.21.11.11, so it can work as primary source for your internal domain. If primary server fails, internal zone files completely. Think they call this domain override, which can be different than your normal forwarder or resolver.

The prolem is more Fedora making it on by default. It should be used only if you are on a network which uses it, and having it enabled on public network can be a security problem. LLMNR is being deprecated by Microsoft on Windows according to https://techcommunity.microsoft.com/blog/networkingblog/aligning-on-mdns-ramping-down-netbios-name-resolution-and-llmnr/3290816 and plan to use mDNS instead. The advantage of mDNS is that you append the .local domain to your lookup to indicate that this name should be resolved by mDNS and no query should be sent to an esternal DNS server.

1 Like

Yes and no. Main problem is not in that LLMNR is enabled by default, but how. Sharing just my opinion.

Problem is it intervenes even with DNS-only tools like those in bind-utils package, which read /etc/resolv.conf and use server(s) specified there, thinking they use DNS protocol only. On Windows programs do not use port 53 (domain) to communicate over localhost AFAIK. They use only API. We have getaddrinfo() API on Linux as well, which could serve LLMNR if desired. On Fedora 40 live ISO, dig NS org will tell you that org does not exist. But dig NS fedoraproject.org gives you addresses list without error. On Fedora 41, this were fixed for selected record types used in DNS only, like DNSKEY, NS or DS. But in default configuration even now, dig org will never try name org. on DNS, because systemd-resolved will stop after appending domain and trying multicast LLMNR search for “org” computer. Problem has arisen because (old) interface to use only DNS were expanded into using additional protocols, which may conflict with DNS responses as in this case.

Yes, mDNS is better because it makes it obvious by having well defined .local domain. It is still sometimes used as internal site-only zone. Avahi’s nss-mdns plugin implements API for mDNS resolution, which does not collide with DNS. LLMNR could be implemented only in similar way, but systemd people made other choices. But yes, protocol independent getaddrinfo() API is old, synchronous and not much flexible. systemd-resolved implements own nss-resolve plugin, which could make LLMNR or mDNS support without breaking DNS. I think used API should determine what protocols are used for name resolution, not only queried name.

Another problem is both LLMNR and mDNS should be ideally enabled only on trusted networks and disabled on public networks with untrusted actors. But then the question is, how to identify network we are on in good enough way, especially user friendly way.

1 Like

I can confirm that using ResolveUnicastSingleLabel=yes now provides my expected behavior:

~$ dig testsql

; <<>> DiG 9.18.35 <<>> testsql
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4882
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;testsql.			IN	A

;; ANSWER SECTION:
testsql.		0	IN	A	172.21.100.45

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Tue Apr 22 17:03:30 PDT 2025
;; MSG SIZE  rcvd: 52

I also included the LLMNR=no option, but I’m not 100% sure that did anything (it did not resolve the issue on its own, but I’m also not using anything that would/should be using LLMNR anyway).

EDIT: After a little more testing I confirmed that this only works with both LLMNR=no and ResolveUnicastSingleLabel=yes options set.

Thank you all very much for your support.