Cannot docker login to DockerHub although can pull public images

Issues executing docker login on newly deployed FCOS VM running locally on VMWareFusion with statically defined networking and no corporate proxies/firewalls. (Note same issue also observed when deployed to vSphere although this latter implementation takes into account a corporate proxy).

Docker login to private or public repos fails with the same behavior:

docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: fifofonix
Password:
Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Slightly more detailed error from journal:

Feb 13 14:45:52 kamino dockerd[2694]: time="2020-02-13T14:45:52.617424407Z" level=error msg="Handler for POST /v1.39/auth returned error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

Note that docker login with incorrect credentials fails as expected with an appropriate error.

Docker pull on public images works (although slow), e.g:

Curl to https sites including docker repos works without cert exceptions returning unauthorized (no login creds provided) as expected:

Openssl TLS handshakes look good (to be expected given curl results).

Have not previously had to declare all repos explicitly with my docker implementations. And would expect a better error message if this was

Really very confused and at my wits’ end. Any words of wisdom appreciated. Not super confident on my NetworkManager set up because I am having to reboot to recognize DNS properly.

ifconfig:
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:b5:7c:69:33  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker_gwbridge: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.1  netmask 255.255.0.0  broadcast 172.18.255.255
        inet6 fe80::42:67ff:fe16:8732  prefixlen 64  scopeid 0x20<link>
        ether 02:42:67:16:87:32  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.135.101  netmask 255.255.255.0  broadcast 172.16.135.255
        inet6 fe80::243:1ec9:84d8:504c  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:f9:f7:a1  txqueuelen 1000  (Ethernet)
        RX packets 16339  bytes 15334773 (14.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3049  bytes 356015 (347.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Expanding on the issue above:

  • Repository Logins Successful via Podman (but slow). Can login from podman to docker.io, quay.io and internal repositories albeit with long delay post punching credentials (~1min).
  • Container Image Downloading Speed - docker.io, quay.io slow. Pulling the 5MB alpine image takes approximately 1 minute whether via podman or docker on FCOS VM. Pulling from a CL VM takes a second or two.
  • Container Image Downloading Speed - fedora registry ok.
    Pulling via podman or docker on FCOS is similar to CL for the following: image: registry.fedoraproject.org/f29/ruby:latest
  • No Observable Network Issues. Concerns around networking have been reduced by successfully executing curl benchmark downloads of large files on both a CL VM and the FCOS VM with approximate performance parity. And, exercising same steps as a docker pull via curl with rapid performance as described here reveals that docker.io is fast.

Theories:

  • Is there some rate limiting being done by docker.io and quay.io specific to FCOS somehow that impacts both logins and downloads? How did CL exempt themselves from this somehow?

hey @fifofonix thanks for posting. I’m not having the troubles you are. I am able to sudo docker login:

[core@localhost ~]$ sudo docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: dustymabe
Password: 
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

and pulling the alpine images using podman and docker both take max a few seconds:

[core@localhost ~]$ date; podman pull docker.io/alpine:latest; date
Mon Feb 17 15:16:48 UTC 2020
Trying to pull docker.io/alpine:latest...
Getting image source signatures
Copying blob c9b1b535fdd9 done
Copying config e7d92cdc71 done
Writing manifest to image destination
Storing signatures
e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a
Mon Feb 17 15:16:50 UTC 2020

[core@localhost ~]$ date; sudo docker pull docker.io/alpine:latest; date
Mon Feb 17 15:18:08 UTC 2020
latest: Pulling from library/alpine
c9b1b535fdd9: Pull complete 
Digest: sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Status: Downloaded newer image for alpine:latest
Mon Feb 17 15:18:09 UTC 2020

Maybe this is either an issue in the version of FCOS you’re using (though I haven’t heard any reports like this) or it’s environmental. The version of FCOS I’m using is the latest testing release at 31.20200210.2.0.

@dustymabe thanks for your comments. My issues are with the latest stable release. Unfortunately I can’t even import the latest testing OVA in order to use it in vSphere. I am going to have to wait for greater stability on testing rather than trying to play with fixing the OVA.

In the mean time I have found that by copying authentication credentials (docker’s config.json) to a FCOS server I can then docker pull successfully from repos requiring said authentication. I also have found that I don’t need to do this at all if I do a stack deploy using the --with-registry-auth switch. The latter was always my preference but for some reason I still found myself needing to docker login on swarm nodes. With the bundled docker version this no longer seems to be a requirement thankfully.

The issue though still persists, and I agree with you it certainly sounds environmental. I wondered whether it was to do with directory/file permissions but I have relaxed these without it helping.

@fifofonix so you have a workaround for now? Is there an actual bug anywhere or we don’t know yet?

I assume the OVA issues you referred to are from the bug you opened: https://github.com/coreos/fedora-coreos-tracker/issues/391

@dustymabe I do have a workaround although I am still unable to do a sudo docker login on either dockerhub or quay. The OVA issues are certainly bundled in with the bug I opened but those reported were on the stable OVA. The testing OVA seems to have additional items that are not working as I cannot even import/export from VSphere to create myself an OVA that I can use with VMWareFusion to work offline. Off on vacation tomorrow but on my return in a week I’ll see if I can do what @dans did and export an OVF from the the testing branch OVA.