Recurring DNS timeouts on bridge br0 after upgrade

After upgrade to F39, I noticed that dnf update would take several minutes to query and download packages, then found that any domain name resolution periodically takes minutes or fails altogether. DNS will work briefly but then fail again, and this coincides with some unusual logs in the resolved journal.

For more context: this is a server that was running F33 for several years continuously without reboot, mainly as a host for multiple guest VMs (kvm, libvirt). After being way past EOL I finally decided to upgrade the host, using dnf-plugin-system-upgrade, which I did in stages, first from F33 to F35, then to F37, and then to F39 (and yes, I know this is pretty much asking for trouble!)

The host has a bridge interface br0 for the guests to connect out via the host’s eno1 interface.

It seems like there are multiple conflicting config files somewhere causing problems with DNS. In the resolved log below, every 5 minutes a new entry of br0: Bus client set DNS server list to: appears, sometimes with just the ipv4 address, and sometimes with 2 additional ipv6 addresses. Strangely (and maybe purely coincidentally), DNS only works when the ipv6 addresses are present, otherwise it hangs for 60s and then fails, for example:

$ resolvectl query example.org --no-pager
example.org: resolve call failed: Connection timed out

Here are the bizarre logs:

root@quine:~# journalctl --no-pager -b -u systemd-resolved.service
Jan 11 21:59:47 quine systemd[1]: Starting systemd-resolved.service - Network Name Resolution...
Jan 11 21:59:47 quine systemd-resolved[2544]: Positive Trust Anchors:
Jan 11 21:59:47 quine systemd-resolved[2544]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Jan 11 21:59:47 quine systemd-resolved[2544]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa d.f.ip6.arpa corp home internal intranet lan local private test
Jan 11 21:59:47 quine systemd-resolved[2544]: Using system hostname 'quine'.
Jan 11 21:59:47 quine systemd[1]: Started systemd-resolved.service - Network Name Resolution.
Jan 11 21:59:52 quine systemd-resolved[2544]: br0: Bus client set default route setting: yes
Jan 11 21:59:52 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 2001:5a8::11, 2001:5a8::33
Jan 11 21:59:52 quine systemd-resolved[2544]: br0: Bus client set search domain list to: Home
Jan 11 21:59:52 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 22:02:14 quine systemd-resolved[2544]: Clock change detected. Flushing caches.
Jan 11 22:06:14 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 22:10:23 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 22:16:14 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 22:16:32 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 22:26:16 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 22:27:23 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 22:36:16 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 22:41:54 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 22:46:17 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 22:47:18 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 22:56:18 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:03:13 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 23:06:19 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:11:42 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 23:16:20 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:18:55 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 23:26:20 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:29:04 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 23:36:21 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:38:56 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 23:46:22 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:50:19 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1
Jan 11 23:56:23 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1, 2001:5a8::11, 2001:5a8::33
Jan 11 23:58:14 quine systemd-resolved[2544]: br0: Bus client set DNS server list to: 192.168.42.1

Additional diagnostics:

$ resolvectl --no-pager status
Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 2 (eno1)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 3 (eno2)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 4 (br0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.42.1
       DNS Servers: 192.168.42.1 2001:5a8::11 2001:5a8::33
        DNS Domain: Home

Link 5 (virbr0)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

Any ideas where to look for the conflicting config? The host and all of the guest VMs are basically unusable because anything requiring a domain name lookup will only work half the time. Thanks y’all!

If the bridge using eno1 is properly configured and the VM guests are properly configured with IP addresses that match the LAN attached to eno1 this may work. However each vm would need to present its own details to the LAN because from the LAN side it would be one MAC with several IPs and the bridge config would need to be correct for each VM. I suspect that each VM would need its own bridge device configured on the host and attached to that interface so communications are not misdirected. If all guests share a single bridge device and ip then the packets can easily be directed to the wrong host.

Each bridge device should then have its own apparent IP similar to the way interfaces could be configured with multiple alias IPs on the same interface in the past so communications could be shared.

In other words, dedicate br0 to VM0 and br1 to VM1, etc. All can use the same physical interface but discrete virtual bridge devices.

Interesting, I haven’t actually tested possible contention between multiple VMs using the bridge simultaneously. I will look into that, but first I need to solve why the DNS is failing, even on the host. This had been working in the original F33 install for years, but something got mucked up in the staged upgrades to 39.

Thank you for the additional info.
Please, on the host, show us the output of both ip address and ip route
Lets get dns working there before mucking with the VMs.

root@quine:~# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000
    link/ether 3c:ec:ef:6b:ce:48 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:ec:ef:6b:ce:49 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 3c:ec:ef:6b:ce:48 brd ff:ff:ff:ff:ff:ff
    inet 192.168.42.201/24 brd 192.168.42.255 scope global dynamic noprefixroute br0
       valid_lft 61987sec preferred_lft 61987sec
    inet6 2001:5a8:6f3:c500:fd14:98aa:bd84:dea8/64 scope global deprecated dynamic noprefixroute 
       valid_lft 596sec preferred_lft 0sec
    inet6 2001:5a8:6f4:2d00:26de:e6bd:bc50:45c5/64 scope global deprecated dynamic noprefixroute 
       valid_lft 1197sec preferred_lft 0sec
    inet6 2001:5a8:6f4:9700:fa39:2a89:20e9:bfa5/64 scope global deprecated dynamic noprefixroute 
       valid_lft 1797sec preferred_lft 0sec
    inet6 2001:5a8:6f4:ff00:3749:f958:b159:b58a/64 scope global deprecated dynamic noprefixroute 
       valid_lft 2399sec preferred_lft 0sec
    inet6 2001:5a8:6f5:6d00:a1ba:37cd:fedc:fe28/64 scope global deprecated dynamic noprefixroute 
       valid_lft 2999sec preferred_lft 0sec
    inet6 2001:5a8:6f5:d200:5e40:d787:5b45:5f4c/64 scope global deprecated dynamic noprefixroute 
       valid_lft 3600sec preferred_lft 0sec
    inet6 2001:5a8:6f6:4d00:9fa1:2745:499:fd72/64 scope global deprecated dynamic noprefixroute 
       valid_lft 4201sec preferred_lft 0sec
    inet6 2001:5a8:6f6:bf00:a243:fa7f:e2a5:1c53/64 scope global deprecated dynamic noprefixroute 
       valid_lft 4801sec preferred_lft 0sec
    inet6 2001:5a8:6f7:2b00:5af8:734e:12d3:4a8/64 scope global deprecated dynamic noprefixroute 
       valid_lft 5402sec preferred_lft 0sec
    inet6 2001:5a8:6f7:8f00:63:897f:2cc5:16e2/64 scope global deprecated dynamic noprefixroute 
       valid_lft 6004sec preferred_lft 0sec
    inet6 2001:5a8:6f7:fe00:1d66:bbba:1570:bc8c/64 scope global deprecated dynamic noprefixroute 
       valid_lft 6605sec preferred_lft 0sec
    inet6 2001:5a8:6f8:7100:4afb:b91a:380d:a7fb/64 scope global dynamic noprefixroute 
       valid_lft 6833sec preferred_lft 531sec
    inet6 fe80::7814:2b22:1a5a:ec3f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:ff:84:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
root@quine:~# ip route
default via 192.168.42.1 dev br0 proto dhcp src 192.168.42.201 metric 425 
192.168.42.0/24 dev br0 proto kernel scope link src 192.168.42.201 metric 425 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown 

Thanks for looking into this, any pointers are greatly appreciated!

Maybe NetworkManager is involved somehow, since I originally setup scripts in /etc/sysconfig/network-scripts. That directory now has a readme pointing me to run this command:

$ nmcli -f name,uuid,filename connection
NAME               UUID                                  FILENAME
br0                9ace389a-6984-430e-a38d-c5b88608e9f9  /etc/NetworkManager/system-connections/br0.nmconnection
lo                 727c8679-f73e-4dab-af00-2afec80598ca  /run/NetworkManager/system-connections/lo.nmconnection
virbr0             c68b5ea8-b8e3-406a-9eb1-24ed15126e4d  /run/NetworkManager/system-connections/virbr0.nmconnection
vnet0              47570fee-f237-45c5-9031-a95ee84f4187  /run/NetworkManager/system-connections/vnet0.nmconnection
bridge-slave-eno1  b13ad45f-6880-4aef-92d8-75e2ede92209  /etc/NetworkManager/system-connections/bridge-slave-eno1.nmconnection
eno1               136807e8-d481-301b-baa8-d10594220518  /etc/NetworkManager/system-connections/eno1.nmconnection
eno2               dd16bb55-41d6-467c-9a64-550bcc90894e  /etc/NetworkManager/system-connections/eno2.nmconnection

The original cfg files were rewritten in key format, here’s what br0 and eno1 contain:

root@quine:~# cat /etc/NetworkManager/system-connections/br0.nmconnection
[connection]
id=br0
uuid=9ace389a-6984-430e-a38d-c5b88608e9f9
type=bridge
interface-name=br0
zone=public

[ethernet]

[bridge]
stp=false

[ipv4]
method=auto

[ipv6]
addr-gen-mode=stable-privacy
method=auto

[proxy]
root@quine:~# cat /etc/NetworkManager/system-connections/eno1.nmconnection
[connection]
id=eno1
uuid=136807e8-d481-301b-baa8-d10594220518
type=ethernet
autoconnect-priority=-999
interface-name=eno1

[ethernet]

[ipv4]
method=auto

[ipv6]
addr-gen-mode=stable-privacy
ip6-privacy=0
method=auto

[proxy]

What happens if you disable IPv6 on enp2s0 (eno1) and/or br0.
There is no routing for IPv6.

I see many inet6 addresses (14) assigned and none with more than 6833sec valid lifetime and all with 0sec preferred lifetime. Maybe the valid lifetime is counting down from 2 hours?

In any case it seems a new inet6 address is being assigned about every 10 minutes.

Possibly the host (and/or VMs) is capable of IPv6 but the router is not so nothing sent via IPv6 can leave the host.

Ok, I am also suspicious of the multiple ipv6 addresses, and the upstream router is consumer grade. What would be the best way to disable ipv6 on that interface? (The only way I know how is the “nuke from orbit” method by turning it off on the kernel command line).

I’ll try this tomorrow when my brain doesn’t feel like mashed potatoes :wink:

Thanks again for the help!

nmcli connection modify br0 ipv6.method disabled

From the gnome settings panel for networking you could disable IPv6 on that interface.

Thanks! I disabled ipv6 on the host bridge br0. It worked briefly but DNS resolution failure began again after about 10 minutes. Now I’ve noticed a new error in the logs:

Jan 12 16:09:50 quine systemd-resolved[1831]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.42.1.
Jan 12 16:25:08 quine systemd-resolved[1831]: Grace period over, resuming full feature set (UDP+EDNS0) for DNS server 192.168.42.1.

I’m currently reading through this systemd bug report to see if it may help.

Here’s another symptom. When I query nslookup with defaults, the query returns ipv4 addresses but hangs when it starts looking for ipv6 addresses. Below compare the results when I run nslookup using default nameserver vs. 8.8.8.8:

czep@quine:~$ nslookup mirrors.fedoraproject.org 
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = wildcard.fedoraproject.org.
Name:	wildcard.fedoraproject.org
Address: 152.19.134.142
Name:	wildcard.fedoraproject.org
Address: 34.221.3.152
Name:	wildcard.fedoraproject.org
Address: 38.145.60.21
Name:	wildcard.fedoraproject.org
Address: 8.43.85.67
Name:	wildcard.fedoraproject.org
Address: 152.19.134.198
Name:	wildcard.fedoraproject.org
Address: 67.219.144.68
Name:	wildcard.fedoraproject.org
Address: 140.211.169.196
Name:	wildcard.fedoraproject.org
Address: 8.43.85.73
Name:	wildcard.fedoraproject.org
Address: 38.145.60.20
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; no servers could be reached


czep@quine:~$ nslookup mirrors.fedoraproject.org 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
mirrors.fedoraproject.org	canonical name = wildcard.fedoraproject.org.
Name:	wildcard.fedoraproject.org
Address: 38.145.60.21
Name:	wildcard.fedoraproject.org
Address: 8.43.85.67
Name:	wildcard.fedoraproject.org
Address: 152.19.134.198
Name:	wildcard.fedoraproject.org
Address: 8.43.85.73
Name:	wildcard.fedoraproject.org
Address: 140.211.169.196
Name:	wildcard.fedoraproject.org
Address: 34.221.3.152
Name:	wildcard.fedoraproject.org
Address: 38.145.60.20
Name:	wildcard.fedoraproject.org
Address: 67.219.144.68
Name:	wildcard.fedoraproject.org
Address: 152.19.134.142
Name:	wildcard.fedoraproject.org
Address: 2600:1f14:fad:5c02:7c8a:72d0:1c58:c189
Name:	wildcard.fedoraproject.org
Address: 2620:52:3:1:dead:beef:cafe:fed6
Name:	wildcard.fedoraproject.org
Address: 2620:52:3:1:dead:beef:cafe:fed7
Name:	wildcard.fedoraproject.org
Address: 2600:2701:4000:5211:dead:beef:fe:fed3
Name:	wildcard.fedoraproject.org
Address: 2605:bc80:3010:600:dead:beef:cafe:fed9
Name:	wildcard.fedoraproject.org
Address: 2604:1580:fe00:0:dead:beef:cafe:fed1

I still cannot isolate what triggers the DNS to fail, it will work for awhile then fail for awhile. Here’s an example of what happens:

  • nslookup using the internal nameserver fails when it attempts to query ipv6 addresses.
  • nslookup using an external nameserver works.
  • ping to ip address works.
  • ping to a domain name fails.
czep@quine:~$ nslookup example.net
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	example.net
Address: 93.184.216.34
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; no servers could be reached


czep@quine:~$ nslookup example.net 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
Name:	example.net
Address: 93.184.216.34
Name:	example.net
Address: 2606:2800:220:1:248:1893:25c8:1946

czep@quine:~$ ping -c 3 93.184.216.34
PING 93.184.216.34 (93.184.216.34) 56(84) bytes of data.
64 bytes from 93.184.216.34: icmp_seq=1 ttl=51 time=4.68 ms
64 bytes from 93.184.216.34: icmp_seq=2 ttl=51 time=4.41 ms
64 bytes from 93.184.216.34: icmp_seq=3 ttl=51 time=4.27 ms

--- 93.184.216.34 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 4.269/4.452/4.678/0.169 ms
czep@quine:~$ ping -c 3 example.net

Try using a public DNS provider such as Google DNS since local resolvers often don’t provide the features necessary for systemd-resolved to work reliably.

Thanks everyone, I really appreciate all the troubleshooting help. It looks like changing the default nameserver on the host (as well as the F39 guests) is working. I followed the process referenced here and here.

The default nameserver 127.0.0.53 still shows up in nslookup and dig queries, but I haven’t experienced any further DNS slowdowns, nor any warnings in the logs for systemd-resolved.service. In the end I’m still not sure if the issue is with resolved, my router, or a combination of the two, but for now I am happy that things appear to be working normally.