DNS resolution broken

Hi,

for some time I sometimes have issues with DNS for some sites. It always seem to affect the same domains and seems to never happen to others.

One that is regulary affected is mirrors.fedoraproject.org. It is somehow related to systemd-resolve and I am not sure how to debug that.

Querying through systemd-resolve:

# dig @127.0.0.1 mirrors.fedoraproject.org

; <<>> DiG 9.16.11-RedHat-9.16.11-5.fc34 <<>> @127.0.0.1 mirrors.fedoraproject.org
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Querying the upstream DNS directly:

# dig @192.168.1.2 mirrors.fedoraproject.org

; <<>> DiG 9.16.11-RedHat-9.16.11-5.fc34 <<>> mirrors.fedoraproject.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37827
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 14, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;mirrors.fedoraproject.org.	IN	A

;; ANSWER SECTION:
mirrors.fedoraproject.org. 297	IN	CNAME	wildcard.fedoraproject.org.
wildcard.fedoraproject.org. 57	IN	A	185.141.165.254
wildcard.fedoraproject.org. 57	IN	A	18.133.140.134
wildcard.fedoraproject.org. 57	IN	A	209.132.190.2
wildcard.fedoraproject.org. 57	IN	A	18.159.254.57
wildcard.fedoraproject.org. 57	IN	A	38.145.60.20
wildcard.fedoraproject.org. 57	IN	A	38.145.60.21
wildcard.fedoraproject.org. 57	IN	A	67.219.144.68
wildcard.fedoraproject.org. 57	IN	A	18.185.136.17
wildcard.fedoraproject.org. 57	IN	A	140.211.169.206
wildcard.fedoraproject.org. 57	IN	A	152.19.134.142
wildcard.fedoraproject.org. 57	IN	A	85.236.55.6
wildcard.fedoraproject.org. 57	IN	A	152.19.134.198
wildcard.fedoraproject.org. 57	IN	A	8.43.85.67

;; Query time: 25 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mi Jun 23 12:11:12 CEST 2021
;; MSG SIZE  rcvd: 285

config:

# resolvectl 
Global
       Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (wlp2s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.1.2
       DNS Servers: 192.168.1.2 2a02:908:1570:5b60:d63f:cbff:fe8d:4c20

Link 11 (enp62s0u1)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Workarounds:

  1. in firefox setting Enable DNS over HTTPS to be able to use a browser normally (otherwise I could not even access this page as id.fedoraproject.org would not resolve for me most of the time as well)
  2. editing /etc/resolve.conf and changing nameserver makes most software work - interestingly not everything. E.g. curl seems to somehow still use systemd-resolve as it keeps insisting that names are not resolveable. Ideas?

Any ideas how to debug this / dig deper? I would like to understand what exactly is the issue not just have workarounds :slight_smile:

Thanks in advance!

1 Like

Try to use a public DNS provider:

sudo nmcli connection show
sudo nmcli connection modify id CON_NAME \
    ipv4.ignore-auto-dns yes \
    ipv6.ignore-auto-dns yes \
    ipv4.dns 8.8.8.8,8.8.4.4
sudo nmcli connection up id CON_NAME

Enable DoT if the issue persists:

sudo mkdir -p /etc/systemd/resolved.conf.d
sudo tee /etc/systemd/resolved.conf.d/00-custom.conf << EOF
[Resolve]
DNSOverTLS=yes
EOF
sudo systemctl restart systemd-resolved.service
6 Likes

I believe this is normal behaviour.
Dig queries dns server.
Your dns server is 192.168.1.2, while 127.0.0.1 is your loopback interface of your client.
If you use dig on a client, it can’t get information which only a dns server could provide.

In some distributions every client also runs a local server, most of the time dnsmasq, which usually listens on 127.0.0.1.

But that has changed with fedora. They now use systemd-resolve, as you said yourself.

And this one listens on 127.0.0.53

However, there also might be a local dns cache, like nscd, that caches dns queries for a period of time.

As said, the local client now uses systemd-resolve.
It has another config file in /etc/systemd/resolved.conf ...
So this local “dns provider” uses the local nameserver: 127.0.0.53, not 127.0.0.1, as mentioned in /etc/resolv.conf btw.

Long story short, if you want to query your “client’s local dns server”, query this:
dig @127.0.0.53 mirrors.fedoraproject.org

Reason behind, the loopback interface is not a single IP, but a whole network, 127.0.0.0/8.

3 Likes

Hi,

thanks for the pointer!

So I just had the issue again. And it is reproducable with dig:

$ dig mirrors.fedoraproject.org

; <<>> DiG 9.16.16-RH <<>> mirrors.fedoraproject.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 42685
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;mirrors.fedoraproject.org.	IN	A

;; Query time: 180 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Do Jun 24 19:01:25 CEST 2021
;; MSG SIZE  rcvd: 54

The answer states:

;; SERVER: 127.0.0.53#53(127.0.0.53)

But asking systemd-resolve directly via @127.0.0.53 works though:

$ dig @127.0.0.53 mirrors.fedoraproject.org

; <<>> DiG 9.16.16-RH <<>> @127.0.0.53 mirrors.fedoraproject.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8918
;; flags: qr rd ra; QUERY: 1, ANSWER: 14, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;mirrors.fedoraproject.org.	IN	A

;; ANSWER SECTION:
mirrors.fedoraproject.org. 300	IN	CNAME	wildcard.fedoraproject.org.
wildcard.fedoraproject.org. 21	IN	A	18.133.140.134
wildcard.fedoraproject.org. 21	IN	A	67.219.144.68
wildcard.fedoraproject.org. 21	IN	A	38.145.60.20
wildcard.fedoraproject.org. 21	IN	A	8.43.85.67
wildcard.fedoraproject.org. 21	IN	A	152.19.134.142
wildcard.fedoraproject.org. 21	IN	A	209.132.190.2
wildcard.fedoraproject.org. 21	IN	A	18.185.136.17
wildcard.fedoraproject.org. 21	IN	A	140.211.169.206
wildcard.fedoraproject.org. 21	IN	A	18.159.254.57
wildcard.fedoraproject.org. 21	IN	A	185.141.165.254
wildcard.fedoraproject.org. 21	IN	A	85.236.55.6
wildcard.fedoraproject.org. 21	IN	A	152.19.134.198
wildcard.fedoraproject.org. 21	IN	A	38.145.60.21

;; Query time: 71 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Do Jun 24 19:01:33 CEST 2021
;; MSG SIZE  rcvd: 285

The manual of dig states:

If no server argument is provided, dig consults /etc/resolv.conf

As the resolv.conf states 127.0.0.53 - how come it results in different results?

$ cat /etc/resolv.conf 
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search local

Running dig mirrors.fedoraproject.org now works as well. I do not know if this is related to the randomess of the issue or if was result of running the command several times and it always works after several attemps - but from my previous experience that was not the case.

1 Like

Post the updated diagnostics:

resolvectl --no-pager status; resolvectl query openwrt.org

Okay, my previous post was not quite on spot - the @127.0.0.53 seemed to just have worked due to the randomness of the issue.

The issue just occured again. Neither dig mirrors.fedoraproject.org nor dig @127.0.0.53 mirrors.fedoraproject.org worked. Tried both multiple times.

Some more diagnostics:

$ resolvectl query openwrt.org
openwrt.org: 139.59.209.225                    -- link: wlp2s0
             2a03:b0c0:3:d0::1af1:1            -- link: wlp2s0

-- Information acquired via protocol DNS in 113.9ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
$ resolvectl query mirrors.fedoraproject.org
mirrors.fedoraproject.org: resolve call failed: Received invalid reply
$ resolvectl --no-pager status
Global
       Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (wlp2s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.1.2
       DNS Servers: 192.168.1.2 2a02:908:1570:5b60:d63f:cbff:fe8d:4c20
        DNS Domain: local

Link 3 (virbr0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

You have an IPv6 in your config.
Is this intentional?

This might be the reason in case this address does not deliver any dns data.
The resolver may use round robin to query this and that address

So it happened again. And I looked into the IPv6 resolver thing:

$ resolvectl query mirrors.fedoraproject.org
mirrors.fedoraproject.org: resolve call failed: Received invalid reply
dig @2a03:b0c0:3:d0::1af1:1 mirrors.fedoraproject.org

; <<>> DiG 9.16.18-RH <<>> @2a03:b0c0:3:d0::1af1:1 mirrors.fedoraproject.org
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

dig at 192.168.1.2 worked.

The IPv6 is coming from my ISP’s router and I cannot disable it. Some IPv6 auto config thing. I overwrote the DNS setting locally on my laptop.

$ resolvectl --no-pager status
Global
       Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (wlp2s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.1.2
       DNS Servers: 192.168.1.2

But I again having the issue:

$ resolvectl query mirrors.fedoraproject.org
mirrors.fedoraproject.org: resolve call failed: Received invalid reply

Does the issue persist if you replace the local resolver with a public one?

systemd-resolved and VPNs is a great resource to configure your resolver based on your vpn use case

The issue is (or was) not related to VPN.

Btw. the issue didn’t happen for several weeks now so I guess it solved itself :thinking:

Thank you, @vgaetera!

Saving to 00-custom.conf, etc, as you said fixed it for me, but, after restarting, it took some fiddling to make it work again. I’m on my way to Fedora 35. We’ll see if it’s stably fixed then.

For me, I had to wait quite a while (5 to 10 min?) before I could resolve a DNS. Seems to be a per-boot wait? Like the cache has to re-populate?

Follow-up: I’m now on Fedora 35, KDE spin.
I still have to restart systemd’s resolver and then wait a few minutes before domains are resolved. I’m thinking about maybe forcing DefaultRoute in a config file.

Other things I did:

  • GUI Network settings – cleared out DNS setting to not compete with /etc/systemd/resolved.conf.d/00-custom.conf
  • /etc/NetworkManager/NetworkManager.conf:
     [main]
     # Done to ensure only systemd.resolve sets DNS:
     dns=none
  • I embellished upon @vgaetera’s config:
   [Resolve]
   DNS=1.1.1.1 1.0.0.1  # <---- This
   DNSOverTLS=yes

Also, I’ve cast a skeptical eye upon those in /run/NetworkManager/devices.

In summary, there’s waaaaay too many things that could possibly affect DNS configuration: NetworkManger config, systemd resolver config, the GUI’s settings…and in quite a few directories: /run, /usr/lib, /etc…man!

My current setup isolates systemd-resolved from NetworkManager:
https://discussion.fedoraproject.org/t/epiphany-gnome-web-webkit-intermittently-breaks/67882/4?u=vgaetera
This way NetworkManager no longer affects DNS settings.

1 Like

Thank you for getting back, @vgaetera.

After over two days of troubleshooting, I just could not get systemd-resolved to work, but I now have a working set-up that appears stable from boot to boot.

Oddities:

  1. In theory, the only time one would need to reboot Linux is for a kernel update, that, for example, restarting services would be sufficient. However, for this adventure in troubleshooting, I had to reboot to see changes. Perhaps something left in /run is to blame for things “remembering” old configuration within a given boot and then that gets cleared during boot-up.
  2. Things got worse before they got better. Before getting a working setup, I lost the abiltiy to ping even IPs, whereas before, I could ping IPs but not domains. That’s in the rear-view mirror now and was a problem experienced before realizing the need to reboot.
  3. Programs exist with logic that actually check to see if /etc/resolv.conf is a symlink and, if so, to what file, and change behavior based off that observation. So the contents could be identical, but because it’s a regular file vs a symlink hither vs a symlink thither, the behavior may be different.

Also, if at any point, you cannot find a config file responsible for a choice of DNS or other network setting, check your GUI settings. KDE user here. Still, potentially effecting files are far and wide: /usr/lib, /etc and /run. Within each, I’d look for NetworkManager and systemd/network.

Working setup:

  • Disabled systemd-resolved with systemctl disable --now systemd-resolved.service
  • Not much in /etc/NetworkManager/NetworkManager.conf: (excluding comments)
    [main]
    [logging]
    
    • Note I got rid of my dns=none line. I didn’t even replace dns=none with dns=default because the man page did not make me confident in any of the choices I had for dns=.
  • New file that I made myself: before, the /etc/systemd/network dir was empty, so I braved a new direction when I created this file:
    /etc/systemd/network/enp0s20u2.network:
    [Match]
    Name=enp0s20u2
    
    [Network]
    DHCP=yes
    DefaultRouteOnDevice=true
    DNSDefaultRoute=true
    DNS=1.0.0.1 1.1.1.1
    
  • Deleted /etc/resolv.conf. Between NetworkManager and my VPN software, that file is created and managed automatically. Note I had to delete the existing symlink there for NetworkManager to do its part.
  • Asked my VPN software to manage DNSes.
  • There’s nothing specified through the GUI. (DNS servers, “IPv4 is required for this connection”, etc.)

How it works

  1. System boots up and NetworkManager creates a shiny new /etc/resolv.conf using my DNS choices specified in /etc/systemd/network/enp0s20u2.network.
  2. I connect to VPN and my VPN software overwrites /etc/resolv.conf with its choice of DNS. BTW, I can only ping that DNS when connected to the VPN. But I connect to the VPN using the VPN server’s domain name, not IP, so I need those initial “bootstrap” DNSes.

How I got there

Erroring on the side of caution, just to get a working Internet connection, I used the GUI to manually specify an IP, subnet mask and gateway, as well as DNSes. I also checked “IPv4 is required for this connection,” being skeptical that IPv6 support might be buggy.

And my aforementioned .network file was a follows:

[Match]
Name=enp0s20u2

[Network]
DHCP=ipv4
DefaultRouteOnDevice=true
DNSDefaultRoute=true
DNS=1.0.0.1
DNS=1.1.1.1

Then I rebooted and was able to connect to VPN. I then ran an update and noticed these potentially relevant packages in the list:

  • systemd-resolved
  • kernel (and friends)
  • systemd
  • systemd-networkd

I don’t know if this made a difference, but, at some point, I waited until I was logged on before inserting my USB Ethernet adapter.

After rebooting with the new kernel and updated packages, I used the GUI to go back to “Automatic” (dynamic) IP, cleared the other GUI settings out and continued experimenting, trying to get systemd-resolved working (with reboots). I could not. I put things as I say above and rebooted. Things are now stably working.

Firefox has its own DNS cache, so I guess I’m not hurting too much for not having systemd-resolved.