I run Docker Swarm clusters with a front-end entry-point/dispatch going through HAProxy on all Leader Nodes. I am using Swarm’s internal DNS that dynamically names services for service discovery by HAProxy.
When the service does not yet exist in Swarm, Swarm’s internal DNS server does not know of the name yet, so it passes HAProxy’s service name queries to upstream nameservers, who of course have no clue and must always return
NXDOMAIN. Swarm also seems to query all of the nameservers available to it at the same time. For non-production Swarms that do not run all of the services, my network’s nameservers are hammered with queries that they can never resolve. Those
NXDOMAIN responses quickly add up to thousands of per second. As a systems admin, this level of inefficiency rubs me the wrong way and it drowns out the legit queries in the query log on my nameservers. For developers who run their own development swarms, I fear that they might be hammering their ISP’s nameservers for Swarm service names.
Unfortunately, as far as I can find, Docker has not allowed us to have any control knobs on its internal DNS server. I wish I could simply tell Swarm to never ask upstream nameservers for unqualified names, or names that look like Swarm service names, but I can’t. It is discouraging as I look through Docker’s boards for fixing issues with the internal DNS. It could take years for Docker to even consider allowing more control over its internal Swarm DNS server, so for now, I must hack around it.
I can’t put a caching nameserver inside my HAProxy image, because that is too high in the stack. It needs to get a real-time response for the current availability of services from Swarm, and it also can’t be barred from querying unqualified names, because that’s how most of them look.
I can’t put a caching nameserver as a service in the Swarm network, because Swarm does not let you use static IP addresses for anything on its overlay networks. Doing that creates a chicken-and-egg causality dependency, where Swarm needs to query a service name, but the service it is querying is the upstream naming service itself. I also don’t want to create yet another independent service that lives outside of Swarm and proliferate yet another configuration dependency and another point of failure.
The best solution that I can find is to run
dnsmasq natively on my CoreOS nodes, telling it to always return NXDOMAIN for unqualified names without even asking upstream nameservers, and of course, a little bit of caching of external names is a nice bonus. Since this is such a basic network service, to do this cleanly means it needs to live outside of Docker because Docker depends upon it.
Fedora CoreOS uses
NetworkManager, which has some pretty slick ways to integrate with
dnsmasq. Simply setting
dns=dnsmasq in NetworkManager’s configuration automatically shims in a caching DNS layer, gracefully supplanting the existing nameservers /etc/resolv.conf and using the original nameservers for its upstream query forwarding. Simply adding
domain-needed to the
dnsmasq configuration and unqualified names are filtered.
If I remember correctly, CoreOS originally had
dnsmasq as part of its distribution because of issues just like this. But in the interest of reducing image size,
dnsmasq was removed because it “seems like it could go.” But that breaks the slick integration that NetworkManager has with it. To me, it “seems like it should stay.” It should be considered a low-level network service that is part of NetworkManager.
I statically compiled
dnsmasq, and the 64-bit binary it produced is a whopping 200KB in size (I only turned on DBUS for NetworkManager, every other bell and whistle is turned off). I believe that size is probably not an issue with
dnsmasq. I added a configuration to ignition to download my static binary to /usr/local/sbin/dnsmasq, and NetworkManager magically found it and runs it as a plugin. This solved the problem, but I’m not happy with how hacky it is.
I had to do the same thing with
snmpd, because running
snmpd inside a container to get host metrics is an abstraction-layer nightmare. There are lots of hacks to out there that try to run
snmpd in a container with high privileges, but none of them work well enough for me. The
snmpd daemon itself is light. My statically-compiled
snmpd is only 2.7MB in size. We do not have to distribute MIBs and other SNMP bloat. It is also ubiquitous. Practically any embedded device out there provides an SNMP service. It is a mature core cross-platform network monitoring platform. To make it available in raw OS on CoreOS would be advantageous for those of us who produce metrics from an a vast number of heterogeneous devices.
Statically compiling stuff to get basic services binaries into a distribution that does not have a package manager has a nasty aftertaste for me. A peeve of mine is when package management has such a high learning curve that most users resort to circumventing the package manager to install the things they need. I believe that the 3MB more of rootfs size is well worth having
snmpd available, especially now that CoreOS’s rootfs does not have to be downloaded from PXE.
I would rather not have to statically compile
dnsmasq and distribute it to my nodes by ignition to get it into CoreOS. I respectfully request that we please put
dnsmasq back, to be available for environments that need it. I also request that we please consider including a light
snmpd binary. Both of these things are such basic services that they work best when they are not in containers, much like any other basic daemons that CoreOS runs.