Curl takes 2 minutes to complete query

,

My Fedora IoT server instance has recently started taking almost exactly 2 minutes to complete curl requests. This starts after a few minutes of runtime after reboot. Initially, curl is fast. The server is running many podman containers, so it could be as those start up that the issue presents itself.

Currently, I can work around the problem by altering /etc/nsswitch.conf, by changing:

hosts:      files myhostname resolve [!UNAVAIL=return] dns

to

hosts:      files myhostname dns resolve [!UNAVAIL=return] dns

If resolve is before dns, it takes 2 minutes to resolve. If dns is first, it resolves straight away.

Restarting systemd-resolved also fixes the issue, at least temporarily. Testing to see whether the fix lasts, but it’s not a good solution - it has to be done after every reboot at the moment.

config files:

/etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search .
/etc/nsswitch.conf
# Generated by authselect
# Do not modify this file manually, use authselect instead. Any user changes will be overwritten.
# You can stop authselect from managing your configuration by calling 'authselect opt-out'.
# See authselect(8) for more details.

# In order of likelihood of use to accelerate lookup.
passwd:     files altfiles systemd
shadow:     files
group:      files [SUCCESS=merge] altfiles [SUCCESS=merge] systemd
hosts:      files myhostname resolve [!UNAVAIL=return] dns
services:   files
netgroup:   files
automount:  files

aliases:    files
ethers:     files
gshadow:    files
networks:   files dns
protocols:  files
publickey:  files
rpc:        files
resolvectl status
Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 2 (enp3s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.88.1
       DNS Servers: 192.168.88.1

Link 3 (wlp4s0)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 6 (wg)
    Current Scopes: DNS
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.64.0.1
       DNS Servers: 10.64.0.1

=====================================================

Tests:

curl when nsswitch has dns after resolve
# time curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

________________________________________________________
Executed in  120.27 secs      fish           external
   usr time   56.56 millis    0.00 micros   56.56 millis
   sys time  140.85 millis  401.00 micros  140.45 millis
curl when nsswitch has dns before resolve
time curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

________________________________________________________
Executed in  246.70 millis    fish           external
   usr time    4.41 millis  217.00 micros    4.19 millis
   sys time    3.51 millis  152.00 micros    3.35 millis

The problem appears to be in DNS resolution, as providing the IP address goes quickly:

curl with --resolve - IPv4
# time curl google.com --resolve google.com:80:142.250.115.100
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

________________________________________________________
Executed in  453.34 millis    fish           external
   usr time    4.26 millis    0.00 micros    4.26 millis
   sys time    2.71 millis  581.00 micros    2.13 millis

As with IPv6:

curl with --resolve - IPv6
time curl -6 google.com --resolve google.com:80:[2404:6800:4015:801::200e]
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

________________________________________________________
Executed in  171.34 millis    fish           external
   usr time    1.39 millis  411.00 micros    0.98 millis
   sys time    5.05 millis  137.00 micros    4.92 millis

All of the DNS resolution tools I can think of resolve quickly:

resolvectl without cache
# time resolvectl query --cache=no google.com
google.com: 142.250.70.206                     -- link: enp3s0
            2404:6800:4015:801::200e           -- link: enp3s0

-- Information acquired via protocol DNS in 1.9ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

________________________________________________________
Executed in    7.32 millis    fish           external
   usr time    1.35 millis  340.00 micros    1.01 millis
   sys time    4.18 millis  143.00 micros    4.04 millis
resolvectl with cache
# time resolvectl query google.com
google.com: 142.250.70.206                     -- link: enp3s0
            2404:6800:4015:801::200e           -- link: enp3s0

-- Information acquired via protocol DNS in 602us.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: cache

________________________________________________________
Executed in    5.97 millis    fish           external
   usr time    0.38 millis  384.00 micros    0.00 millis
   sys time    5.13 millis  160.00 micros    4.97 millis
nslookup
# time nslookup google.com
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	google.com
Address: 142.250.70.206
Name:	google.com
Address: 2404:6800:4015:801::200e


________________________________________________________
Executed in   23.35 millis    fish           external
   usr time    4.38 millis    0.00 micros    4.38 millis
   sys time    7.13 millis  570.00 micros    6.56 millis
dig without server specified
# time dig google.com

; <<>> DiG 9.18.30 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45636
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		33	IN	A	142.250.70.206

;; Query time: 1 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Mon Nov 11 09:20:02 AWST 2024
;; MSG SIZE  rcvd: 55


________________________________________________________
Executed in   21.02 millis    fish           external
   usr time    4.71 millis    0.00 micros    4.71 millis
   sys time    6.40 millis  517.00 micros    5.88 millis
dig with server specified
# time dig @192.168.88.1 google.com

; <<>> DiG 9.18.30 <<>> @192.168.88.1 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18581
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		29	IN	A	142.250.70.206

;; Query time: 2 msec
;; SERVER: 192.168.88.1#53(192.168.88.1) (UDP)
;; WHEN: Mon Nov 11 09:20:06 AWST 2024
;; MSG SIZE  rcvd: 55


________________________________________________________
Executed in   22.35 millis    fish           external
   usr time    3.28 millis    0.00 micros    3.28 millis
   sys time    8.33 millis  671.00 micros    7.66 millis
dig with IPv6
# time dig AAAA google.com

; <<>> DiG 9.18.30 <<>> AAAA google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55977
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com.			IN	AAAA

;; ANSWER SECTION:
google.com.		142	IN	AAAA	2404:6800:4006:814::200e

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Mon Nov 11 09:34:10 AWST 2024
;; MSG SIZE  rcvd: 67


________________________________________________________
Executed in   23.20 millis    fish           external
   usr time    3.89 millis  399.00 micros    3.50 millis
   sys time    8.00 millis  133.00 micros    7.86 millis

Ping has the same issue as Curl

ping when nsswitch has dns after resolve
# time ping one.one.one.one
PING one.one.one.one (2606:4700:4700::1001) 56 data bytes
64 bytes from one.one.one.one (2606:4700:4700::1001): icmp_seq=1 ttl=61 time=2.04 ms
64 bytes from one.one.one.one (2606:4700:4700::1001): icmp_seq=2 ttl=61 time=2.13 ms
64 bytes from one.one.one.one (2606:4700:4700::1001): icmp_seq=3 ttl=61 time=2.10 ms
64 bytes from one.one.one.one (2606:4700:4700::1001): icmp_seq=4 ttl=61 time=1.67 ms
64 bytes from one.one.one.one (2606:4700:4700::1001): icmp_seq=5 ttl=61 time=1.84 ms
^C
--- one.one.one.one ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 123848ms
rtt min/avg/max/mdev = 1.673/1.956/2.130/0.172 ms

________________________________________________________
Executed in  244.64 secs      fish           external
   usr time    1.12 millis  293.00 micros    0.83 millis
   sys time    5.07 millis   98.00 micros    4.97 millis
ping when nsswitch has dns before resolve
# time ping one.one.one.one
PING one.one.one.one (2606:4700:4700::1111) 56 data bytes
64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=1 ttl=61 time=1.97 ms
64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=2 ttl=61 time=1.91 ms
64 bytes from one.one.one.one (2606:4700:4700::1111): icmp_seq=3 ttl=61 time=1.75 ms
^C
--- one.one.one.one ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 1.750/1.873/1.965/0.090 ms

________________________________________________________
Executed in    2.25 secs      fish           external
   usr time    0.29 millis  288.00 micros    0.00 millis
   sys time    3.78 millis   91.00 micros    3.69 millis

The issue is not dependent on which user runs the command.

If anyone has any ideas to narrow this issue down, I’d appreciate it.

Do you provide your own dns server?
Is it returning both ipv4 and ipv6 results?
If only ipv4 that maybe issue. Something might be waiting on the ipv6 result.

I would use dig to request the type=A and type=AAAA records to see what happens in each case.

Also you could do a packet capture and see what dns requests are being issued when you run curl.

You have 2 links labeled with +DefaultRoute and using different resolvers.
At a minimum, this creates uncertainty that makes more difficult to isolate your issue.
At worst, it can cause a race condition leading to potential problems.

I do run my own DNS server at 192.168.88.1. It’s an Adguard Home instance. It is returning both IPv4 and IPv6 results - I have a dig example in the post requesting AAAA records.

I’ll be looking to see if I can do any useful packet capture next, yeah :+1:

Thanks - yeah, this is something I’ve thought of too. My routing is a little unusual and has a single user account (using ip routing rules) that gets sent through the wireguard interface. Obviously without knowing the IP address, nothing can know whether to request DNS through the wireguard interface.

As it doesn’t actually serve any purpose I care about to have DNS going through wireguard here, I’ve removed that from my wireguard setup:

resolvectl status after removing WG DNS config
# resolvectl status
Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 2 (enp3s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.88.1
       DNS Servers: 192.168.88.1

Link 3 (wlp4s0)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 4 (wg)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

So far, the issue isn’t showing up. I’m half-wondering if my wireguard VPN provider (mullvad) throttles DNS requests, and maybe systemd-resolved waits for all responses before continuing? For the moment, I’m going to keep an eye on it. If it doesn’t show up as an issue again in another couple of days, and with a few reboots, I’ll mark it as solved.

In the meantime, I’m going to go trawl through the systemd-resolved behaviour and see if it waits for all responses - though that doesn’t really explain to me why any of my other queries using resolvectl and family went through straight away.