Cannot docker login to DockerHub although can pull public images

Issues executing docker login on newly deployed FCOS VM running locally on VMWareFusion with statically defined networking and no corporate proxies/firewalls. (Note same issue also observed when deployed to vSphere although this latter implementation takes into account a corporate proxy).

Docker login to private or public repos fails with the same behavior:

docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: fifofonix
Password:
Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Slightly more detailed error from journal:

Feb 13 14:45:52 kamino dockerd[2694]: time="2020-02-13T14:45:52.617424407Z" level=error msg="Handler for POST /v1.39/auth returned error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

Note that docker login with incorrect credentials fails as expected with an appropriate error.

Docker pull on public images works (although slow), e.g:

  • docker pull alpine
  • docker pull Quay
  • docker pull my-internal-repo/my-public-image

Curl to https sites including docker repos works without cert exceptions returning unauthorized (no login creds provided) as expected:

Openssl TLS handshakes look good (to be expected given curl results).

Have not previously had to declare all repos explicitly with my docker implementations. And would expect a better error message if this was

Really very confused and at my wits’ end. Any words of wisdom appreciated. Not super confident on my NetworkManager set up because I am having to reboot to recognize DNS properly.

ifconfig:
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:b5:7c:69:33  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker_gwbridge: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.1  netmask 255.255.0.0  broadcast 172.18.255.255
        inet6 fe80::42:67ff:fe16:8732  prefixlen 64  scopeid 0x20<link>
        ether 02:42:67:16:87:32  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.135.101  netmask 255.255.255.0  broadcast 172.16.135.255
        inet6 fe80::243:1ec9:84d8:504c  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:f9:f7:a1  txqueuelen 1000  (Ethernet)
        RX packets 16339  bytes 15334773 (14.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3049  bytes 356015 (347.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Expanding on the issue above:

  • Repository Logins Successful via Podman (but slow). Can login from podman to docker.io, quay.io and internal repositories albeit with long delay post punching credentials (~1min).
  • Container Image Downloading Speed - docker.io, quay.io slow. Pulling the 5MB alpine image takes approximately 1 minute whether via podman or docker on FCOS VM. Pulling from a CL VM takes a second or two.
  • Container Image Downloading Speed - fedora registry ok.
    Pulling via podman or docker on FCOS is similar to CL for the following: image: registry.fedoraproject.org/f29/ruby:latest
  • No Observable Network Issues. Concerns around networking have been reduced by successfully executing curl benchmark downloads of large files on both a CL VM and the FCOS VM with approximate performance parity. And, exercising same steps as a docker pull via curl with rapid performance as described here reveals that docker.io is fast.

Theories:

  • Is there some rate limiting being done by docker.io and quay.io specific to FCOS somehow that impacts both logins and downloads? How did CL exempt themselves from this somehow?

hey @fifofonix thanks for posting. I’m not having the troubles you are. I am able to sudo docker login:

[core@localhost ~]$ sudo docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: dustymabe
Password: 
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

and pulling the alpine images using podman and docker both take max a few seconds:

[core@localhost ~]$ date; podman pull docker.io/alpine:latest; date
Mon Feb 17 15:16:48 UTC 2020
Trying to pull docker.io/alpine:latest...
Getting image source signatures
Copying blob c9b1b535fdd9 done
Copying config e7d92cdc71 done
Writing manifest to image destination
Storing signatures
e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a
Mon Feb 17 15:16:50 UTC 2020

[core@localhost ~]$ date; sudo docker pull docker.io/alpine:latest; date
Mon Feb 17 15:18:08 UTC 2020
latest: Pulling from library/alpine
c9b1b535fdd9: Pull complete 
Digest: sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Status: Downloaded newer image for alpine:latest
Mon Feb 17 15:18:09 UTC 2020

Maybe this is either an issue in the version of FCOS you’re using (though I haven’t heard any reports like this) or it’s environmental. The version of FCOS I’m using is the latest testing release at 31.20200210.2.0.

@dustymabe thanks for your comments. My issues are with the latest stable release. Unfortunately I can’t even import the latest testing OVA in order to use it in vSphere. I am going to have to wait for greater stability on testing rather than trying to play with fixing the OVA.

In the mean time I have found that by copying authentication credentials (docker’s config.json) to a FCOS server I can then docker pull successfully from repos requiring said authentication. I also have found that I don’t need to do this at all if I do a stack deploy using the --with-registry-auth switch. The latter was always my preference but for some reason I still found myself needing to docker login on swarm nodes. With the bundled docker version this no longer seems to be a requirement thankfully.

The issue though still persists, and I agree with you it certainly sounds environmental. I wondered whether it was to do with directory/file permissions but I have relaxed these without it helping.

@fifofonix so you have a workaround for now? Is there an actual bug anywhere or we don’t know yet?

I assume the OVA issues you referred to are from the bug you opened: OVA format issues prevent deployment to VMWareFusion and VMWareWorkstation · Issue #391 · coreos/fedora-coreos-tracker · GitHub

@dustymabe I do have a workaround although I am still unable to do a sudo docker login on either dockerhub or quay. The OVA issues are certainly bundled in with the bug I opened but those reported were on the stable OVA. The testing OVA seems to have additional items that are not working as I cannot even import/export from VSphere to create myself an OVA that I can use with VMWareFusion to work offline. Off on vacation tomorrow but on my return in a week I’ll see if I can do what @dans did and export an OVF from the the testing branch OVA.

I’m still facing this issue (which I accept is most likely environmental) even with the latest stable release fedora-coreos-31.20200210.3.0-vmware.x86_64.ova.

If you try in a different environment (for example, launch an instance in AWS) can you login?

bump - @fifofonix :arrow_up:

Hi,
I had similar issue, try to modify your MTU, I had to do this on a Debian/kubernetes/Docker …
I’ve set it to 1440, it was 1500, with : sudo ip link set dev [younetwinterface] mtu 1440

@nemric thanks for the suggestion but that hasn’t helped for me. I did change eth0 to 1440 and rebooted but same issue.

Note that I did check the MTU on my working legacy CL instances and they were 1500 with no issues as it seems the default is for FCOS.

@dustymabe I am caught up in some other things right now but will definitely be coming back to this and create a minimal ignition file that exhibits the problem in my environment and then transport to a cloud machine. Going to take a couple of weeks.

I have reproduced this issue with a minimal ignition file although I have not yet determined a fix/workaround. In short the issue is related to how I am setting docker daemon proxy settings.

If I remove the proxy then docker login works fine but with the proxy definition it does not. The fairly minimal ignition file that follows is structured to use a single proxy definition file across docker/zincati.

This has allowed me to validate that the proxy works for zincati, yet the same config file does not work for docker when enabled (it causes docker login to fail for some reason). The proxy definition is interpreted correctly though at least when docker info is examined.

Docker login error when proxy enabled:

  • Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

‘Minimal’ Ignition file:

# This file is a Fedora CoreOS ignition template file in yml format.
#
# It is a template file because we have used {python_variable} syntax to
# define variable substitution of various dynamic cluster parameters that
# need to be determined at boot time and which cannot be hard-coded.
#
# Ignition files do support templating natively for certain common cloud
# deployment environments (e.g. AWS EC2) but do not support VMWare.  We
# could implement a custom provider that expands some boot parameters such
# as IP address sourcing it from within the booted host.  However, there are
# some pre-boot variables like the etcd boot cluster information that still
# needs to be generated for each host.
#
# Our create_cluster.py script expands this template into separate yml files
# for each cluster node.  These are then consumed by the Fedora CoreOS transpiler
# 'ftt' which validates the yml and generates a JSON ignition file that can then
# be passed into a booting FCOS image as user data.
variant: fcos
version: 1.0.0
passwd:
  users:
    - name: core
      # core is essentially root for FCOS.  Interactive loging not enabled
      # by default but this password used for execution of for # example:
      # rpm-ostree install open-vm-tools
      password_hash: "****"
      ssh_authorized_keys:
      # Public ssh keys are fine to be stored in this file.  It is the private
      # key that must be kept super-secret as it provides access...
        - ssh-rsa ****
      groups: [ sudo, docker ]
storage:
  files:
    - path: /etc/hostname
      mode: 0420
      overwrite: true
      contents:
        inline: "{cluster_node}"
    - path: /etc/NetworkManager/system-connections/eth0.nmconnection
      mode: 0600
      overwrite: true
      contents:
        inline: |
          [connection]
          id=Uplink
          type=802-3-ethernet
          autoconnect=true
          interface-name=eth0
          match-device=interface-name:eth0

          [ethernet]
          mac-address={cluster_node_mac}

          # The DNS servers will not necessarily be set correctly on first boot.
          # Simply doing a sudo reboot fixes this problem.  Strange.
          [ipv4]
          method=manual
          addresses={cluster_node_ipv4}/24
          gateway={cluster_node_gateway}
          dns=8.8.8.8
          dns-search={ad_domain}

          [route]
          gateway=0.0.0.0
          destination={cluster_node_gateway}/32
     # This defines env variables for shells...
    - path: /etc/profile.d/my-corp.env.sh
      mode: 0644
      contents:
        inline: |
          export http_proxy={cluster_http_proxy}
          export HTTP_PROXY={cluster_http_proxy}
    # The env file defining proxies used by our systemd drop-ins
    - path: /etc/my-corp-proxy.env
      mode: 0644
      contents:
        inline: |
          all_proxy="{cluster_http_proxy}"
          http_proxy="{cluster_http_proxy}"
          HTTP_PROXY="{cluster_http_proxy}"
          no_proxy="{cluster_name_base}*,{cluster_ipv4_block_string},*.my-corp.edu,127.0.0.1,0.0.0.0,localhost"
    - path: /etc/sysconfig/docker
      mode: 0644
      overwrite: true
      contents:
        # We customise this because we want to suppress --live-restore which
        # is incompatible with docker swarm.
        inline: |
          OPTIONS="--selinux-enabled
          --log-driver=journald
          --default-ulimit nofile=1024:1024
          --init-path /usr/libexec/docker/docker-init
          --userland-proxy-path /usr/libexec/docker/docker-proxy
          "
    # Chronyd is complicated and allows for multiple sources from multiple strata
    # which is degrees of separation from an atomic clock etc.  We use it in
    # a simple mode and this is confirmed as working if proxies are set correctly.
    # Does take a few mins post boot to synch.  Sources and status with:
    # `chronyc status` `chronyc tracking`.  Tried to get this to work as a systemd
    # drop-in override but failed I think because /etc/sysconfig/chronyd exists in
    # base FCOS image and may override drop-in?
    - path: /etc/sysconfig/chronyd
      mode: 0644
      overwrite: true
      contents:
        inline: |
          OPTIONS="'server {ntp_server}'
          "
    - path: /etc/fedora-coreos-pinger/config.d/99-disable-reporting.toml
      mode: 0644
      overwrite: true
      contents:
        inline: |
          [reporting]
          enabled = false
    # Zincati is the new locksmithd and allows configuraiton of update clusters
    # and speed of applying patches.  No cluster locking though...that is separate.
    - path: /etc/zincati/config.d/51-rollout-wariness.toml
      mode: 0644
      contents:
        inline: |
          [identity]
          rollout_wariness = 0.001
          [updates]
          strategy= "immediate"
systemd:
  units:
    # Several of the FCOS services are packaged and delivered via rpm-ostree and furthermore
    # these are run via the rpm-ostreed agent...so it needs to be proxy configured.
    - name: "rpm-ostreed.service"
      dropins:
        - name: "99-proxy.conf"
          contents: |
            [Service]
            EnvironmentFile=/etc/my-corp-proxy.env
    - name: docker.service
      enabled: true
      dropins:
        - name: 99-http-proxy.conf
          contents: |
            [Service]
            ExecStartPre=/usr/bin/env
            #LINE THAT FOLLOW CONTROLS WHETHER DOCKER LOGIN
            #WORKS OR NOT>>>
            #EnvironmentFile=/etc/my-corp-proxy.env
    - name: zincati.service
      enabled: true
      dropins:
        # Zincati 0.0.8 uses system wide proxy definitions but 0.0.6 which is the
        # bundled version at the time of writing does not.  We use a dropin to over-
        # ride just these settings for the service.
        - name: 99-http-proxy.conf
          contents: |
            [Service]
            # Next line can be used to echo first and last line of env variables
            # to journalctl
            #ExecStartPre=/usr/bin/env
            ExecStartPre=echo Proxies are... $http_proxy $https_proxy
            # Next line can be used to increase verbosity of logging -vv
            Environment=ZINCATI_VERBOSITY="-vv"
            # Next line is not necessary if we use /etc/systemd/system.conf to apply
            # settings to all systemd services as a default.
            EnvironmentFile=/etc/my-corp-proxy.env

Output of salient part of docker info when proxy is enabled (and docker login then fails):

...
HTTP Proxy: http://196.52.80.111:80
No Proxy: minimal*,172.16.54.125,172.16.54.126,172.16.54.127,172.16.54.128,172.16.54.129,*.my-corp.edu,127.0.0.1,0.0.0.0,localhost
...

Since the error from docker login that you posted originally is trying to contact docker hub (HTTPS):

Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

should you also be defining HTTPS_PROXY=? Configure Docker to use a proxy server

I had actually tried that @dustymabe in one of my earlier iterations, and I just re-ran that test with both HTTPS_PROXY and https_proxy set. The issue persists however.

Note that in these latest tests since I’m running directly connected to the internet, but specifying a freely available internet-based proxy. In general in this environment I would expect an absence of specifying a proxy setting to result in successful access to an internet-based resource.

hey @fifofonix, unfortunately I don’t have an easy way to reproduce the issue. Here are my ideas for now:

  • file a bug against moby-engine to see if the maintainer can help: Log in to Red Hat Bugzilla
  • see if anyone else with a similar proxy setup has the same problem
  • try with docker package from other distros or from upstream to see if the same problem happens

hey @fifofonix - any chance your proxy is using socks5h ? After talking with a few people in #fedora-coreos it seems like there are some issues with older golang and a proxy with socks5h.

cc @mheon

Can you provide the output of a podman login command with the --log-level=debug option added?

This sounds like a networking issue of some sort, but it’s very difficult to tell what kind.

1 Like

@mheon apologies but an earlier post I made was incorrect (deleted it just now). I have in fact no issues logging in via podman. Here is what you asked for anyway in case it provides insight.

[core@d-pdm-lbuild-1 rmx]$ podman login --log-level=debug
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/home/core/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/home/core/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/containers
DEBU[0000] Using static dir /var/home/core/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /var/home/core/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] No store required. Not opening container store.
DEBU[0000] Initializing event backend journald
DEBU[0000] using runtime "/usr/bin/runc"
DEBU[0000] using runtime "/usr/bin/crun"
INFO[0000] running as rootless
DEBU[0000] registry not specified, default to the first registry "docker.io" from registries.conf
DEBU[0000] Credentials not found
Username: fifofonix
Password:
DEBU[0047] Looking for TLS certificates and private keys in /etc/docker/certs.d/docker.io
DEBU[0047] GET https://registry-1.docker.io/v2/
DEBU[0058] Ping https://registry-1.docker.io/v2/ status 401
DEBU[0058] GET https://auth.docker.io/token?account=fifofonix&service=registry.docker.io
DEBU[0068] GET https://registry-1.docker.io/v2/
Login Succeeded!

@dustymabe, good news, it turned out that this was a local environment issues (as expected) with an internal DNS server returning a stale IP for registry-1.docker.io. Thanks for your patience and ideas on how to triage this. I’m not sure how much value this will add to the community so if you judge the thread warrants deleting then I will not be offended!

Thanks for reporting back @fifofonix - glad we finally found out what the pesky problem was!