Wireguard taking forever for dns replies

Thanks, I’m learning something new. Better late than never.

I don’t use dnsmasq for now while I’m running a wireguard client. But how to tell whether or not systemd-resolved is actually running/active. I’m new to this and would not want to break again my freshly fixed wireguard config by messing with settings that should be valid for openvpn only but could potentially interfere with wireguard.

Is it related to this?

$ resolvectl query example.com | tail -n 1
-- Data from: network
$ resolvectl query example.com | tail -n 1
-- Data from: cache

Or to that?

$ sudo cat /etc/systemd/resolved.conf.d/custom.conf 
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it under the
#  terms of the GNU Lesser General Public License as published by the Free
#  Software Foundation; either version 2.1 of the License, or (at your option)
#  any later version.
#
# Entries in this file show the compile time defaults. Local configuration
# should be created by either modifying this file, or by creating "drop-ins" in
# the resolved.conf.d/ subdirectory. The latter is generally recommended.
# Defaults can be restored by simply deleting this file and all drop-ins.
#
# Use 'systemd-analyze cat-config systemd/resolved.conf' to display the full config.
#
# See resolved.conf(5) for details.

[Resolve]
# Some examples of DNS servers which may be used for DNS= and FallbackDNS=:
# Cloudflare: 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com 2606:4700:4700::1111#cloudflare-dns.com 2606:4700:4700::1001#cloudflare-dns.com
# Google:     8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.google 2001:4860:4860::8844#dns.google
# Quad9:      9.9.9.9#dns.quad9.net 149.112.112.112#dns.quad9.net 2620:fe::fe#dns.quad9.net 2620:fe::9#dns.quad9.net
DNS=127.0.0.1
#DNS=192.168.167.10
#FallbackDNS=103.86.96.100#nordvpn 103.86.99.100#nordvpn
#Domains=
DNSSEC=allow-downgrade
DNSOverTLS=opportunistic
#DNSOverTLS=yes
#MulticastDNS=no
#LLMNR=resolve
#Cache=yes
#CacheFromLocalhost=no
#DNSStubListener=yes
#DNSStubListenerExtra=
#ReadEtcHosts=yes
#ResolveUnicastSingleLabel=no
#StaleRetentionSec=0

It currently isn’t:

$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 192.168.167.10

What do you recommend? I recall it originally was and I defeated that feature of lack of understanding.

rm /etc/resolv.conf
ln -s /var/run/systemd/resolve/stub-resolv.conf /etc/resolv.conf

Nameserver in /etc/resolv.conf is 127.0.0.53 now, your own nameserver server should be configured via NetworkManager or resolved.conf.

The trick in Fedora is, that systemd-resolved is supplying DNS for “gethostbyname” system calls. /etc/resolv.conf is unused.
If you use dig or nslookup, /etc/resolv.conf is consulted, and to make it consistent, /etc/resolv.conf points via 127.0.0.53 to systemd-resolved.
The 127.0.0.53 makes it still possible to run unbound or bind on 127.0.0.1

In order to try to understand what could not be understood: is 103.86.96.100
really the DNS server advertised in NordVPNs wireguard configs? My VPN is
on 10.2.0.2 routed to 10.2.0.1 with DNS 10.2.0.1, so DNS server within the wireguard mesh. Any dns request to an internet facing nameserver should be natted and connection tracked, so much more efforts to do. May be the problems are server-side and not on your side.

Unfortunately, I do not understand your last screenshot: connections between
10.5.0.2 and 192.168.167.10, which is your own nameserver with unroutable address.
I would not expect the wireguard interface being involved in connection to your own lan.

1 Like

ln -s /var/run/systemd/resolve/stub-resolv.conf /etc/resolv.conf

Will try that first thing tomorrow and retest wireguard. Thanks

is 103.86.96.100 really the DNS server advertised in NordVPNs wireguard configs?

Last time I checked there were those pushed on us. I just hardcoded them in my config.

connections between 10.5.0.2 and 192.168.167.10, which is your own nameserver with unroutable address.

10.5.0.2 is my wireguard client on 192.168.167.x/32, this machine
192.168.167.10 is on the same lan and holds a bind (named) server
in addition to that 192.168.167.10 is itself a wireguard client to nordvpn

any dns my bind server does not know is sent to the configured resolvers in bind config. you guessed it, the 2 nordvpn dns servers

so nothing leak outside of nordvpn tunnels but lan traffic and voip.

This goes beyond my knowledge. Is your machine really on 192.168.167.x/32, so alone on it’s subnet? That would be a construction I’ve no experience with.

Its the default and if resolvectl status is reporting stuff then its running.
You should find that there is a running service:

systemctl status systemd-resolved.service

Edit: One neat feature of systemd-resolved is that it supports split-horizon DNS: Split-horizon DNS - Wikipedia

1 Like

Hi,

Will test more for a few days but everything seems fixed:

  • Had to remove DNS=127.0.0.1 from /etc/systemd/resolved.conf.d/custom.conf otherwise a lot of dns queries were issued to localhost, never to be answered while using openvpn
  • Had to change the DNS in all my NetworkManager openvpn configs: sed -i 's/127.0.0.1/192.168.167.10/g' $(find . -type f -name 'ca*nordvp*'. This is not something you would normally have to do as the dns are normally pushed on you.
  • replaced /etc/resolv.conf with a symlink: sudo ln -s /var/run/systemd/resolve/stub-resolv.conf /etc/resolv.conf

My wireguard config was already fixed yesterday with DNS = 192.168.167.10 and dnsmasq discarded and replaced with systemd-resolved.

$ resolvectl query example.com | tail -n 1
-- Data from: network
$ resolvectl query example.com | tail -n 1
-- Data from: cache
$ dig @127.0.0.53 google.com

; <<>> DiG 9.18.26 <<>> @127.0.0.53 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25045
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		60	IN	A	192.0.0.88

;; Query time: 7 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Jul 11 12:17:15 EDT 2024
;; MSG SIZE  rcvd: 55

Thanks for everyone who helped, in no particular order:

1 Like

Congratulations that it works with Fedora’s recommended setup.
I’m still not completely understanding what happens. I replicated your setup with another Fedora system and a poor-mans bind server: “DNSStubListenerExtra=ipaddress” in resolved.conf, systemd-resolved now listens on both 127.0.0.53 and the lan interface.

I can reproduce the strange 10.xxx to 192.xxx communication: it’s systemd-resolved which is responsible for it, and it binds to what it thinks is the correct interface: wireguard.
Which is correct in the original case with NordVPN DNS, but this is not correct if you use local DNS.
After some seconds, it concludes that it does not work, transfers the socket to graveyard (from debug log), and takes the working bind server and stays on that route.
DNS via TCP is also involved in connection attempts.

I still not understand how your DNS server can communicate with 10.5.0.2, here it fails as expected.

My conclusion is for your case: do NOT specify DNS. (and do not accept it pushed by openvpn)
resolvectl does not set the defaultroute flag on wireguard, and keeps your bind as DNS.
So the VPN software is preparing to do split DNS, but this is just not what you want using own bind server.

But still not understanding the original problem because in that case the wireguard interface is correct to reach NordVPN. Could be dnsmasq+systemd-resolved interaction, but I do not observe it here.

I have to ask a question: I assume your wireguard config is different on both systems with different public and private keys. Since wireguard is not connection oriented, identical config on both systems would work, but not at the same time and may be they have to struggle a bit to make connection. This could explain the problem the thread is beginning with.

Congratulations for getting most of it to work! People generally use the app for the wireguard part.

I’m not familiar enough with resolved.conf settings to understand what this would do

The original problem was DNS=127.0.0.1 was specified in resolved.conf IIRC. Removing that line fixed most of it. And symlinking /etc/resolv.conf to its appropriate file.

I still have a problem though. At reboot, the service does not set the wg0 DNS to 192.168.167.10 but to 127.0.0.53. I have to restart the interface wg-quick down wg0 and wg-quick up wg0 otherwise the browsers can’t connect to anything. Dnf update still works, kodi works with the proper path, but not the browsers. Weird. I’ll try to diagnose that at reboot tomorrow now that I’m aware of this issue. sudo systemctl restart wg-quick@wg0 could do as well, or wg syncconf wg0 <(wg-quick strip wg0). To be tested. Could take a few days though.

Same keys, but different servers, I used originally the same server on both machines and that sometimes caused problems when I disconnected and reconnected the one on the machine hosting the bind server. This was also probably, as your noted, part of the problem.

/etc/bind# cat named.conf.options
options {
        directory "/var/cache/bind";

        allow-query-cache {192.168.167/24;127/8;};
        forwarders {
                103.86.96.100;
                103.86.99.100;
        };

        dnssec-validation auto;

        auth-nxdomain no;    # conform to RFC1035
        listen-on-v6 { none; };
};
$ sudo /usr/local/sbin/dnstest.sh
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 1500 bytes
07:25:08.527781 lo    In  IP 127.0.0.1.47565 > 127.0.0.1.53: 16118+ AAAA? linux486. (26)
07:25:08.528807 wg0   Out IP 10.5.0.2.57978 > 103.86.96.100.53: 43318+% [1au] AAAA? linux486. (65)
07:25:08.543131 wg0   In  IP 103.86.96.100.53 > 10.5.0.2.57978: 43318 NXDomain$ 0/6/1 (1028)
07:25:08.556276 lo    In  IP 127.0.0.1.53 > 127.0.0.1.47565: 16118 NXDomain 0/1/0 (101)
07:28:02.208038 enp4s0 In  IP 192.168.167.30.41832 > 192.168.167.2.53: 2+ A? us.pool.ntp.org. (33)
07:28:02.209125 wg0   Out IP 10.5.0.2.47997 > 103.86.96.100.53: 59740+% [1au] A? us.pool.ntp.org. (72)
07:28:02.280408 wg0   In  IP 103.86.96.100.53 > 10.5.0.2.47997: 59740 4/0/1 A 104.167.215.195, A 72.14.183.39, A 152.70.159.102, A 199.188.64.12 (108)
07:28:02.300564 wg0   Out IP 10.5.0.2.47460 > 103.86.96.100.53: 53181+% [1au] DS? ntp.org. (64)
07:28:02.390337 wg0   In  IP 103.86.96.100.53 > 10.5.0.2.47460: 53181 0/6/1 (782)
07:28:02.404999 enp4s0 Out IP 192.168.167.2.53 > 192.168.167.30.41832: 2 4/0/0 A 104.167.215.195, A 72.14.183.39, A 199.188.64.12, A 152.70.159.102 (97)
07:30:02.087582 lo    In  IP 127.0.0.1.41362 > 127.0.0.1.53: 57089+ A? github.com. (28)
07:30:02.087628 lo    In  IP 127.0.0.1.41362 > 127.0.0.1.53: 27166+ AAAA? github.com. (28)
07:30:02.088649 wg0   Out IP 10.5.0.2.52408 > 103.86.96.100.53: 33656+% [1au] A? github.com. (67)
07:30:02.088734 wg0   Out IP 10.5.0.2.60132 > 103.86.96.100.53: 51955+% [1au] AAAA? github.com. (67)
07:30:02.113143 wg0   In  IP 103.86.96.100.53 > 10.5.0.2.60132: 51955 0/1/1 (123)
07:30:02.125289 wg0   In  IP 103.86.96.100.53 > 10.5.0.2.52408: 33656 1/0/1 A 140.82.113.3 (55)
07:30:02.126004 wg0   Out IP 10.5.0.2.34262 > 103.86.96.100.53: 381+% [1au] DS? github.com. (67)
07:30:02.151223 wg0   In  IP 103.86.96.100.53 > 10.5.0.2.34262: 381 0/6/1 (568)

I’ve the problems a bit in reverse order. With the VPN I use, I can without problem contact their DNS in 10.2.0.1. But I can change this to 1.1.1.1 and contact Cloudfare without hassle via VPN, so a DNS server on the internet side should work without delays. So the standard Wireguard config without local DNS server should work without delays. I tried the dnsmasq version also and this works too.

Using a DNS server in my network brings systemd-resolved into problems, because the “defaultroute” flag is set, and systemd-resolved insists in using wg0 to reach my local server, which fails and is in contradiction with “ip route get”. Finally, it corrects itself and works.

Your last wireguard screenshot shows happy communication between 10.5.0.2 and 192.168.167.10, which is normally not possible. Between 10.5.0.2 and 103.86.96.100 is fine, that should go via wireguard, but 192.168.167.10 should not be reachable via the wireguard interface as it is on the LAN.

Same key, different servers, I would think that this separates both systems but no idea how the servers are glued together at NordVPN.

The /etc/resolv.conf always points to 127.0.0.53, but this should never appear as DNS server in wg0. I never used the wg-quick@ service, so no experience yet.
As the service is just calling wg-quick, does it help just leaving out “dns” in the wg0.conf? Then it does not touch dns config and only your own server is used.

This could be confusing. Each computer has its own 10.5.0.2. 192.168.167.10 is the DNS server. It’s configured to be accessible to every computer in the LAN. Whatever computer with 10.5.0.2 I own could query 192.168.167.10, which in turn, replies. Besides on that last screenshot, 10.5.0.2 and 192.168.167.10 are the same machine, the Debian server with bind (named), which has a secondary IP for eth0: 192.168.167.2.

~$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.167.2/24 brd 192.168.167.255 scope global enp4s0
       valid_lft forever preferred_lft forever
    inet 192.168.167.10/24 brd 192.168.167.255 scope global secondary enp4s0:1
       valid_lft forever preferred_lft forever
    inet6 fe80::xxxx:xxx:xxxx:xxxx/64 scope link
       valid_lft forever preferred_lft forever
4: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none
    inet 10.5.0.2/32 scope global wg0
       valid_lft forever preferred_lft forever

As I understand it, it’s essential. Resolvconf seems to have its own idea of the DNS so reloading the wg0 config fixes things. Probably order services are started must be checked. I’ll see that in the next days once the issue is confirmed.

Again I’m not familiar with resolvconf. I’ve been using dnsmasq, bind and manual network configuration for decades. I was perfectly happy with dnsmasq where cache size and caching period could be configured; resolvconf has no such thing and a tiny cache. My motto is “if it’s not broken, don’t fix it”. I’m not always happy with changes in linux since kernel 2.4. First it was OSS → alsa → jack → pulse → pipewire which is a total hassle to reconfigure. Then udev, systemd whereas I was extremely comfortable with init.d, then btrfs and all the problems associated with it, and now wayland which breaks a lot more things, cups that is now driverless and broke printer access to each and every computer on my LAN… All in all, I’d like to actually use my computers instead of taking months to reconfigure them properly at each new distro version.

As I understand it, keys are only for identification to authorise access. Once granted, new key pairs are generated for encryption and they change periodically.

I’m afraid I can’t help yet. Not until all issues on my side have been fixed.

But I can explain the problems. If systemd-resolved wants to connect to a local server using wireguard interface, it will fail as expected.

I did an experiment: a wireguard server and two clients with same config.
If I ping the server from one system, it works. If I ping the server from the other client, it takes about 20 seconds before it starts, the other one stops.
I’ve to try further, but I cannot exclude that the 10.x to 192.x connection goes via the common wireguard.

It could be that two servers have different IP’s on the same server or clustered somehow.

That’s interesting. Good idea I had stopping using the same config on all computers. I now use different clients, but same keys, Will have to remember that when it’s time to give mullvad a try. Thanks for the confirmation.

You be generated unique keys for each client?

Each client should use its own unique private key, otherwise it makes cryptokey routing problematic and can lead to delays, timeouts and packet loss when more than one client is active.

Not sure I understand. Each nordvpn wireguard udp server I’m connecting to use the same public key e.g.

echo 'wireguard udp servers:'
curl -s "https://api.nordvpn.com/v1/servers/recommendations?filters\[servers_technologies\]\[identifier\]=wireguard_udp&limit=5"\
|jq -r '.[]|.hostname, .station, (.locations|.[]|.country|.city.name), (.locations|.[]|.country|.name), (.technologies|.[].metadata|.[].value), .load'
echo ' '

I may be wrong but my understanding is the private key is derived from the public key. If we use a different one for each server (endpoint), how nordvpn knows we are authorised to use their services?

I know openvpn uses a user / pass scheme, always the same, but for wireguard I haven’t think about it yet. I just tried the default. I know where the public key came from, but for the private one, it’s been years since I made the config, so I don’t remember how I got/made them.

Each public key is generated from a private key.
This is how public-key cryptography works.

For the wireguard in Fedora, so not what providers fiddle into it, each client should generate an unique private key. The public key derived from this private key should be entered into server configuration plus the unique /32 address from which is connected. The provider I use uses 10.2.0.2 for all clients, but there is a NAT in between to some address unique for the wireguard server. On the webpage you can generate configs for each server. I do not know whether it’s a problem if all servers have a equal private and thus equal public key.
Nor do I know how they handle the maximum number of connections, with openvpn, you log in, with Wireguard you just connect.

In lab setting I can reproduce the delay with two clients with equal config and keys, but cannot access the DNS server on the other client. The DNS request enters client 1 wg0 but does not reach client2 wg0.

So in my opinion there is a bug or a feature in systemd-resolved: if the dns entry in the wireguard config is within the local lan, which is probably highly unusual,
it is sent to the wireguard interface instead of respecting normal routing. But it recovers.

Concerning dnsmasq: Developments like firewalld and NetworkManager implement more dynamic networking controlled via dbus. Wireless, VPN and so on. If you are more comfortable with dnsmasq, use it, just do not use both. And if NetworkManager stops supporting it, set dns to none and run dnsmasq as service. In your situation with own DNS server, split dns is unwanted so a superfluous option of systemd-resolved. In /etc/nsswitch.conf, there is a line hosts which defines access order: host file, myself, mdns .local domain, systemd-resolved, dns. If systemd-resolved is turned off, it is skipped and /etc/resolv.conf consulted as before.

Well, don’t know if this helps but yesterday I tried a nordvpn wireguard server from another country and it did not connect anymore.

They did change their public key from the last time so I just changed it in my wireguard client config as well, keeping the same private key and it worked again.

Interesting reading here. From that I conclude you can’t generate keys yourself, which makes perfect sense since you don’t have access to the nordvpn server config.

Again I could be wrong, I’ve never been able to recall how public-key cryptography works more than a few minutes. I have to re-learn it from the beginning each and every time. Considerable headache source… Nevertheless, I’ve been using it for more than 20 years starting in the pgp era, mixmaster, private idaho, etc. I set it up and forget everything about it 5 minutes later. IIRC Netscape had 128 bit encryption and the international version was limited to 40 bits.

I understood from your links NordVPN does not use Wireguard, but Nordlynx based on wireguard. There is only one IP, which is NATted to some other IP to enter the Wireguard. No problem, nat is no crypto. But for native wireguard, you have to login, get a nordlynx adapter and fetch the wireguard parameters from that by script. But is there only one private key per user? How about multiple devices which need different private keys using standard Wireguard? Do you get different keys from different devices? But they should be all kept at NordVPN, how are they managed? As non-subscriber I do not find details about that.