Dnsmasq times out querying upstream dns for local lookup?

@hmmsjan: ‘The nameservers are double defined, both in NetworkManager and dnsmasq.’ So, drop the DNS entries in bridge0?
@vgaetera: ‘Make sure to remove search domains from your connection settings’ And so the same for the search domain entry?
YES YES YES!!! F’me - that’s fixed it… Now we can all get some sleep :laughing:

@hmmsjan: ‘It looks like the search domain “office.lan” in NetworkManager is the culprit !!’
So, I stuck it back in… It didn’t break - it still works
So, I stuck the DNSs back in… It did break… But, there’s more… From outside nslookup and ping timed out and expanded server to server.office.lan. From inside nslookup behaved the same, timed out, but ping didn’t time out but also didn’t expand server… Hmm, I don’t know if that means anything - ping doesn’t seem to expand on the inside whether DNSs and search domain are populated or not…

I’ve taken both DNSs and search domain out of the bridge0 config and NM restarts dnsmasq without hiccup…

Is there any reason that server isn’t expanded to server.office.lan when pinged from inside the server…?

1 Like

It probably does not need to.
It does after all recognize its own host name. Without the domain part it will latch onto the hostname as its own and look no further.

Great that it works. I found very elaborate information in https://networkmanager.pages.freedesktop.org/NetworkManager/NetworkManager/nm-settings-nmcli.html but it is not yet clear for me how this is supposed to work.
If I want to do something similar to this topic using only NetworkManager:

 nmcli dev show bridge0 | egrep '(DNS|SEARCH)'
IP4.DNS[1]:                             100.0.0.1
IP4.DNS[2]:                             101.0.0.1
IP4.DNS[3]:                             102.0.0.1
IP4.SEARCHES[1]:                        test1.lan
IP4.SEARCHES[2]:                        test2.lan
IP4.SEARCHES[3]:                        subnet.test2.lan

I get:

ep 26 07:02:13 server dnsmasq[2099]: using nameserver 100.0.0.1#53 for domain test1.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 100.0.0.1#53 for domain test2.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 100.0.0.1#53 for domain subnet.test2.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 100.0.0.1#53 for domain 1.168.192.in-addr.arpa
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 101.0.0.1#53(via bridge0)
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 101.0.0.1#53 for domain test1.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 101.0.0.1#53 for domain test2.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 101.0.0.1#53 for domain subnet.test2.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 101.0.0.1#53 for domain 1.168.192.in-addr.arpa
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 102.0.0.1#53(via bridge0)
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 102.0.0.1#53 for domain test1.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 102.0.0.1#53 for domain test2.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 102.0.0.1#53 for domain subnet.test2.lan
Sep 26 07:02:13 server dnsmasq[2099]: using nameserver 102.0.0.1#53 for domain 1.168.192.in-addr.arpa

so the DNS and SEARCHES arrays are not bound together, but for each domain I get the 3 nameservers.
Need some study, but for the moment I think I highly prefer doing such things in dnsmasq configuration. And I really do not like program functions switched invisibly due to a backend configuration. And what about IPv6 having separate configuration in NetworkManager?

systemd-resolved (rocky9) gives:

Link 3 (bridge0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS
                    DNSSEC=no/unsupported
Current DNS Server: 100.0.0.1
       DNS Servers: 100.0.0.1 101.0.0.1 102.0.0.1
        DNS Domain: subnet.test2.lan test1.lan test2.lan

which is in effect the same result.

Okay, but isn’t that what NM is supposed to be doing - writing /etc/resolv.conf?

$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 127.0.0.1
options edns0 trust-ad

‘# Generated by NetworkManager’ - and, there is no search domain in there…?

And, if written by NM, anything I put in there is going to get overwritten, isn’t it?

Is that the dnsmasq config in /etc/NetworkManager/dnsmasq.d (cf. /etc/NetworkManager/system-connections) or dnsmasq config in /etc/dnsmasq.d (cf. /etc/NetworkManager/system-connections)?

Surely NM should be picking up what’s in its config of dnsmasq - that is, what’s in /etc/NetworkManager/dnsmasq.d ? And, NM should have the smarts that if it is running dnsmasq from its config, then it shouldn’t be shoving separate confgs of its own (from /etc/NetworkManager/system-connections) at dnsmasq via DBus - if I understand that correct (I may very well not…) then surely that’s a bug in NM?

Shouldn’t NM be picking up

$ cat /etc/NetworkManager/dnsmasq.d/01*|grep local=
local=/office.lan/

And, populating /etc/resolv.conf with

# Generated by NetworkManager
search office.lan

?

Re the ping problem, this is what I get on my other network:

me@myoldmachine:~$ ping -4 myoldmachine
PING  (192.168.0.141) 56(84) bytes of data.
64 bytes from myoldmachine.home.lan (192.168.0.141): icmp_seq=1 ttl=64 time=0.044 ms
64 bytes from myoldmachine.home.lan (192.168.0.141): icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from myoldmachine.home.lan (192.168.0.141): icmp_seq=3 ttl=64 time=0.043 ms
^C
---  ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.043/0.047/0.056/0.005 ms
me@myoldmachine:~$ cat /etc/resolv.conf
# Generated by NetworkManager
search home.lan
nameserver 192.168.0.10
nameserver fd00:0:0:5::10
me@myoldmachine:~$

So, pinging the machine internally, the machine name gets expanded - and /etc/resolv.conf is populated with the search domain, but not so the server:

[admin@server ~]$ ping -4 server
PING server (192.168.1.40) 56(84) bytes of data.
64 bytes from server (192.168.1.40): icmp_seq=1 ttl=64 time=0.085 ms
64 bytes from server (192.168.1.40): icmp_seq=2 ttl=64 time=0.096 ms
64 bytes from server (192.168.1.40): icmp_seq=3 ttl=64 time=0.119 ms
^C
--- server ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2068ms
rtt min/avg/max/mdev = 0.085/0.100/0.119/0.014 ms
[admin@server ~]$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 127.0.0.1
options edns0 trust-ad
[admin@server ~]$

…?

I’m still reading posts, but is that supposed to work?

I get this:

espionage724@Spinesnap:~$ nslookup google.com 127.0.0.1
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused
;; no servers could be reached

But a random tip had me change it to 127.0.0.53 that does something different:

espionage724@Spinesnap:~$ nslookup google.com 127.0.0.53
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	google.com
Address: 142.250.31.101
Name:	google.com
Address: 142.250.31.113
Name:	google.com
Address: 142.250.31.100
Name:	google.com
Address: 142.250.31.138
Name:	google.com
Address: 142.250.31.139
Name:	google.com
Address: 142.250.31.102
Name:	google.com
Address: 2607:f8b0:4006:806::200e

If that differs when connecting to a LAN device, I can test that out if it’d be helpful (I don’t know any domain names for my LAN stuff off-hand :p)

I use systemd-resolved (DoT notes) and in NetworkManager I have IPv4 and IPv6 with Automatic DNS disabled, and blank/no DNS IPs specified.

espionage724@Spinesnap:~$ resolvectl status
Global
         Protocols: LLMNR=resolve -mDNS +DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub
Current DNS Server: 2620:fe::fe:11#dns11.quad9.net
       DNS Servers: 9.9.9.11#dns11.quad9.net 149.112.112.11#dns11.quad9.net
                    2620:fe::11#dns11.quad9.net 2620:fe::fe:11#dns11.quad9.net

Link 2 (enp0s20f0u5u3)
    Current Scopes: none
         Protocols: -DefaultRoute LLMNR=resolve -mDNS +DNSOverTLS
                    DNSSEC=no/unsupported

Link 3 (wlo1)
    Current Scopes: LLMNR/IPv4 LLMNR/IPv6
         Protocols: -DefaultRoute LLMNR=resolve -mDNS +DNSOverTLS
                    DNSSEC=no/unsupported

I doubt that NetworkManager devs can solve this.
The very idea of running dnsmasq under NetworkManager and DBus is a slippery surface with multiple caveats due to possibly conflicting settings.

The entire config directory is passed to dnsmasq without parsing its contents by NetworkManager.

You can add search domains in the global DNS section.

On the other hand, your setup is complex enough to justify using the standalone dnsmasq service, so it should no longer be affected by the above issues.

This is a custom setup:

1 Like

You are right. I do not know whether NetworkManager always overwrites this file, e.g. without DNS specified. I’ve to admit I did a trick: made /etc/resolv.conf immutable.

It’s all stuff in /etc/NetworkManager/dnsmasq.d concatenated. In this case, you could run dns/dhcp stand-alone as dnsmasq service, the extra dbus input of NetworkManager was messing up the configuration.

Concerning the ping on the server: it is handled by the “files” section in /etc/nsswitch.conf before reaching the dns system, because server is in /etc/hosts. Then it knows only “server” and “192.168.1.40”. You can try the very old method: put
192.168.1.40 server.office.lan server
in /etc/hosts, or move the line into /etc/dnsmasq.hosts, but then “myhostname” might tackle it before “dns”

I’ve marked @hmmsjan’s post as solution - it, with @vgaetera’s post immediately prior, seem to answer the thread title and stop dnsmasq from timing out querying upstream. However, I think we’re probably agreed - hope I’m not speaking out of turn - that there’s more questions to be answered… And, possibly unanswerable…

I’ve been resisting giving up on having dnsmasq run under NM and using a standalone dnsmasq service - as both @vgaetera and @hmmsjan have encouraged - because this seems to be a ‘supported’ use case for dnsmasq and NM, but - it seems broken… As @vgaetera puts it:

I’ll see if I can make a meaningful bug report - even if it’s unsolvable… Bug RHEL-59988 no longer seems helpful.

And, thanks for settling my ping hangup.

It’s been a very interesting journey :wink:

1 Like

I found https://networkmanager.dev/docs/ for “read the friendly manual” with interesting things like /etc/resolv.conf handling.
I learned immediately that it is normal that all DNS servers are contacted together, after that only the fastest responding is used.

I think the whole stuff with dnsmasq/ipv4.dns-search is intended for VPN DNS handling instead of the ugly /etc/resolv.conf manipulations, and not to be chained into a complex dnsmasq configuration. I doubt whether it will be changed/debugged because it is superseded by systemd-resolved.

NetworkManager still uses dnsmasq for it’s “sharing” option, providing dhcp and dns facilities, and this works perfect complete with IPv6 prefix delegation.

indeed very interesting and learned a lot about dnsmasq and NetworkManager.