No network (IPv4?) connectivity via Ethernet at first boot for LUKS/Clevis setup

What I did

Use a YAML to wipe and encrypt the root partition / and expand it e.g. and setup the encryption via clevis with a tang server:

variant: fcos
version: 1.3.0
# ...
luks:
    # encrypted root
    - name: root
      label: luks-root
      device: /dev/disk/by-partlabel/root
      clevis:
        tpm2: false
        threshold: 1
        tang:
          - url: # ...
            thumbprint: # ..
      wipe_volume: true

Just installed it directly to the boot disk:

$ sudo coreos-installer install /dev/sdd -i ignition/[…].ign --image-file downloads/fedora-coreos-33.20210328.3.0-metal.x86_64.raw.xz --delete-karg 'console=ttyS0,115200n8'

And booted.

What happened

Now, however, when booting it all works fine until Clevis tries to contact the Tang server for the encryption key for LUKS.
Strangely I get an network error.

Ignition reports STDERR output for clevis luks bind -f -k /somekeyfile/root -d “…/by-partlabel/root” “sss” “some-json”:

Unable to fetch advertisement: [url]
Invalid input!

It also reports one more warning, which is likely not related though and clearly a bug or so.

Troubleshooting

Actually, in the emergency shell I discovered that I cannot curl to any URL or so, so it clearly has no networking.

The device is connected via an Ethernet cable, so that should™ work. Additionally, I should say when using the “live system”, i.e. just booting from an USB with the default ignition file, it actually also does work. There, I can also use nmcli etc., but this does not work in the emergency shell apparently…

So I just used ip and it indeed shows (only) IPv6 adresses hmm… :thinking:
Note my Tang server URL is an IPv4 address.

(Sorry for the image I tried to mount an attached USB drive, but could not find the correct /dev/… device, at least there was no /dev/sd*.)

Also note the LEDs at the network adapter look well (yellow top left is constantly on; green top right is slowly blinking at regular intervals).


Edit: Also reported as a bug in the Ignition bug tracker now: No network (IPv4?) connectivity via Ethernet at first boot for LUKS/Clevis setup · Issue #1201 · coreos/ignition · GitHub

From @bgilbert I got a tip at GitHub:

As for why you don’t have networking in the initramfs… that’s another question. I’d suggest starting by checking for NetworkManager journal entries.

So:

  • I first noticed I had included the /adv for the tang server in the domain in the CoreOS configuration file, which is of course not what we want, as it duplicated that key. However, that’s just another little problem that I would have seen later.
  • Another maybe related problem I see is that I cannot mount any USB drive, respectively the initramfs system does not detect it. I can run lsblk and although the USB stick is connected I always get the same result with only my internal disk. Maybe it has a problem with all peripherals? Though, input/output devices work, so my USB keyboard is recognized.
  • I managed to now mount /boot properly, so I can now give you logs as text files, at least… :tada:

And this dracut guide gave also some good hints.
BTW I’m not using fedora-coreos-33.20210426.2.0-metal.x86_64.raw (from current testing) as that was said to wait for a Tang server.

$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
   inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
   inet6 ::1/128 scope host 
      valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
   link/ether 00:*****:11 brd ff:ff:ff:ff:ff:ff
   inet6 fd1c:f1a1:f84b::bee/128 scope global noprefixroute 
      valid_lft forever preferred_lft forever
   inet6 fd1c:f1a1:f84b:0:2e0:4cff:fee9:2411/64 scope global noprefixroute 
      valid_lft forever preferred_lft forever
   inet6 fe80::2e0:4cff:fee9:2411/64 scope link noprefixroute 
      valid_lft forever preferred_lft forever

Thankfully I also have the 300kB full copy of rdsosreport.txt, which dracut or so generated, which, however, also just contains this information. But feel free to ask for more.

NetworkManager CLI logs:

$ journalctl -t NetworkManager
-- Logs begin at Sun 2021-05-02 18:23:06 UTC, end at Sun 2021-05-02 19:13:00 UTC. --
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4938] NetworkManager (version 1.26.8-1.fc33) is starting... (for the first time)
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4938] Read config: /etc/NetworkManager/NetworkManager.conf
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4949] auth[0x5557cdd39f00]: create auth-manager: D-Bus connection not available. Polkit is disabled and only root will be authorized.
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4955] manager[0x5557cdd84010]: monitoring kernel firmware directory '/lib/firmware'.
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4956] hostname: hostname: hostnamed not used as proxy creation failed with: Could not connect: No such file or directory
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4957] dns-mgr[0x5557cdd7d2c0]: init: dns=default,systemd-resolved rc-manager=symlink
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4998] manager[0x5557cdd84010]: rfkill: Wi-Fi hardware radio set enabled
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.4998] manager[0x5557cdd84010]: rfkill: WWAN hardware radio set enabled
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5006] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.26.8-1.fc33/libnm-device-plugin-team.so)
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5006] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5006] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5006] manager: Networking is enabled by state file
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5006] dhcp-init: Using DHCP client 'internal'
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5007] settings: Loaded settings plugin: keyfile (internal)
May 02 18:23:08 localhost NetworkManager[531]: <warn>  [1619979788.5007] settings: could not load plugin 'ifcfg-rh' from file '/usr/lib64/NetworkManager/1.26.8-1.fc33/libnm-settings-plugin-ifcfg-rh.so': No such file or directory
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5016] device (lo): carrier: link connected
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5016] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5018] manager: (enp3s0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
May 02 18:23:08 localhost NetworkManager[531]: <info>  [1619979788.5020] device (enp3s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
May 02 18:23:08 localhost NetworkManager[531]: <warn>  [1619979788.7301] sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5203] device (enp3s0): carrier: link connected
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5204] device (enp3s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5207] policy: auto-activating connection 'Wired Connection' (2d90b610-a502-4954-a9a8-64a67ae0a4a3)
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5209] device (enp3s0): Activation: starting connection 'Wired Connection' (2d90b610-a502-4954-a9a8-64a67ae0a4a3)
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5209] device (enp3s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5209] manager: NetworkManager state is now CONNECTING
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5210] device (enp3s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5211] device (enp3s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 02 18:23:11 localhost NetworkManager[531]: <info>  [1619979791.5212] dhcp4 (enp3s0): activation: beginning transaction (timeout in 45 seconds)
May 02 18:23:13 localhost NetworkManager[531]: <info>  [1619979793.4855] dhcp6 (enp3s0): activation: beginning transaction (timeout in 45 seconds)
May 02 18:23:13 localhost NetworkManager[531]: <info>  [1619979793.4869] dhcp6 (enp3s0): option dhcp6_name_servers   => 'fd1c:f1a1:f84b::1'
May 02 18:23:13 localhost NetworkManager[531]: <info>  [1619979793.4869] dhcp6 (enp3s0): option ip6_address          => 'fd1c:f1a1:f84b::bee'
May 02 18:23:13 localhost NetworkManager[531]: <info>  [1619979793.4869] dhcp6 (enp3s0): state changed unknown -> bound
May 02 18:23:15 localhost NetworkManager[531]: <info>  [1619979795.2218] device (enp3s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2219] device (enp3s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2219] device (enp3s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2220] manager: NetworkManager state is now CONNECTED_LOCAL
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2222] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device)
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2223] device (enp3s0): Activation: successful, device activated.
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2223] manager: startup complete
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2224] quitting now that startup is complete
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2355] dhcp4 (enp3s0): canceled DHCP transaction
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2355] dhcp4 (enp3s0): state changed unknown -> done
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2356] dhcp6 (enp3s0): canceled DHCP transaction
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2356] dhcp6 (enp3s0): state changed bound -> done
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2356] device (enp3s0): DHCPv6: trying to acquire a new lease within 90 seconds
May 02 18:23:15 localhost.localdomain NetworkManager[531]: <info>  [1619979795.2356] exiting (success)

And here is the Clevis error this results in:

May 02 18:23:37 localhost.localdomain ignition[580]: Ignition failed: failed to create luks: binding clevis device: exit status 1: Cmd: "clevis" "luks" "bind" "-f" "-k" "/run/ignition/luks-keyfiles/root" "-d" "/run/ignition/dev_aliases/dev/disk/by-partlabel/root" "sss" "{\"pins\":{\"tang\":[{\"url\":\"http://**my-IPv4-to-tang-server**:**my-tang-server-port**\",\"thp\":\"J-***********\"}]},\"t\":1}" Stdout: "" Stderr: "Warning: Value 512 is outside of the allowed entropy range, adjusting it.\nUnable to fetch advertisement: 'http://**my-IPv4-to-tang-server**:**my-tang-server-port**/adv/J-*********!\nInvalid input!\nUsage: jose jwe fmt -i JWE [-I CT] [-o JWE] [-O CT] [-c]\n\nConverts a JWE between serialization formats\n\n  -i JSON --input=JSON     Parse JWE from JSON\n  -i FILE --input=FILE     Read JWE from FILE\n  -i -    --input=-        Read JWE from standard input\n\n  -I FILE --detached=FILE  Read decoded ciphertext from FILE\n  -I -    --detached=-     Read decoded ciphertext from standard input\n\n  -o JSON --output=JSON    Parse JWE from JSON\n  -o FILE --output=FILE    Read JWE from FILE\n  -o -    --output=-       Read JWE from standard input\n                           Default: \"-\"\n\n  -O JSON --detach=JSON    Parse JWE from JSON\n  -O FILE --detach=FILE    Read JWE from FILE\n  -O -    --detach=-       Read JWE from standard input\n\n  -c      --compact        Output JWE using compact serialization\n\nFailed to import token from file.\nError saving metadata to LUKS2 header in device /run/ignition/dev_aliases/dev/disk/by-partlabel/root\nUnable to update metadata; operation cancelled\nError adding new binding to /run/ignition/dev_aliases/dev/disk/by-partlabel/root\n"

What is basically just says it what it always said:

Unable to fetch advertisement

So given the NetworkManager extract only talks about IPv6, could it really be that ignores IPv4 as I intitially thought?

I searched for Ipv4 in the whole system log:

May 02 18:23:07 localhost systemd[1]: Starting Apply Kernel Variables...
May 02 18:23:07 localhost systemd-journald[254]: Missed 1 kernel messages
May 02 18:23:07 localhost kernel: usb 1-4: New USB device found, idVendor=03f0, idProduct=034a, bcdDevice= 1.21
May 02 18:23:07 localhost kernel: usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
May 02 18:23:07 localhost kernel: usb 1-4: Product: HP Elite USB Keyboard 
# NOTE: It detected my keyboard here.
May 02 18:23:07 localhost kernel: usb 1-4: Manufacturer: Chicony
May 02 18:23:07 localhost systemd-sysctl[294]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
May 02 18:23:07 localhost systemd-sysctl[294]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
May 02 18:23:07 localhost systemd-sysctl[294]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
May 02 18:23:07 localhost systemd-sysctl[294]: Not setting net/ipv4/conf/default/accept_source_route (explicit setting exists).
May 02 18:23:07 localhost systemd-sysctl[294]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
May 02 18:23:07 localhost systemd-sysctl[294]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
May 02 18:23:07 localhost systemd[1]: Finished Apply Kernel Variables.
# […]
May 02 19:10:08 localhost.localdomain kernel: audit: type=1130 audit(1619982608.244:109): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 02 19:10:08 localhost.localdomain systemd-sysctl[2616]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
May 02 19:10:08 localhost.localdomain systemd-sysctl[2616]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
May 02 19:10:08 localhost.localdomain systemd-sysctl[2616]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
May 02 19:10:08 localhost.localdomain systemd-sysctl[2616]: Not setting net/ipv4/conf/default/accept_source_route (explicit setting exists).
May 02 19:10:08 localhost.localdomain systemd-sysctl[2616]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
May 02 19:10:08 localhost.localdomain systemd-sysctl[2616]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
May 02 19:10:08 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 02 19:10:08 localhost.localdomain systemd[1]: Finished Apply Kernel Variables.
# […]

May 02 19:10:13 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 02 19:10:13 localhost.localdomain systemd-sysctl[2900]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
May 02 19:10:13 localhost.localdomain systemd-sysctl[2900]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
May 02 19:10:13 localhost.localdomain systemd-sysctl[2900]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
May 02 19:10:13 localhost.localdomain systemd-sysctl[2900]: Not setting net/ipv4/conf/default/accept_source_route (explicit setting exists).
May 02 19:10:13 localhost.localdomain systemd-sysctl[2900]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
May 02 19:10:13 localhost.localdomain systemd-sysctl[2900]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
May 02 19:10:13 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 02 19:10:13 localhost.localdomain systemd[1]: Finished Apply Kernel Variables.
# […]
May 02 19:10:15 localhost.localdomain systemd[1]: Finished Create Static Device Nodes in /dev.
May 02 19:10:15 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 02 19:10:15 localhost.localdomain systemd-sysctl[3196]: Not setting net/ipv4/conf/all/rp_filter (explicit setting exists).
May 02 19:10:15 localhost.localdomain systemd-sysctl[3196]: Not setting net/ipv4/conf/default/rp_filter (explicit setting exists).
May 02 19:10:15 localhost.localdomain systemd-sysctl[3196]: Not setting net/ipv4/conf/all/accept_source_route (explicit setting exists).
May 02 19:10:15 localhost.localdomain systemd-sysctl[3196]: Not setting net/ipv4/conf/default/accept_source_route (explicit setting exists).
May 02 19:10:15 localhost.localdomain systemd-sysctl[3196]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
May 02 19:10:15 localhost.localdomain systemd-sysctl[3196]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
May 02 19:10:15 localhost.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 02 19:10:15 localhost.localdomain systemd[1]: Finished Apply Kernel Variables.

Does anyone know how to further debug this? Is there any way I can access nmcli from dracut e.g.? It’s not in /sbin or so, I already checked that.

Cross-posted a TL;DR to Ask Fedora now.

You can’t use nmcli in the initramfs unfortunately. NetworkManager runs once, brings up the connection(s) and then exits. I’m wondering if in your environment NM is bringing up the network and getting an address (an IPv6 address) and then exiting before it tries or gets an IPv4 address.

Let’s take IPv6 out of the picture just to test the theory. I think something like this should work:

sudo coreos-installer install /dev/sdd -i ignition/[…].ign --image-file downloads/fedora-coreos-33.20210328.3.0-metal.x86_64.raw.xz --delete-karg 'console=ttyS0,115200n8' --append-karg 'ipv6.disable=1'
1 Like

Well… if we disable IPv6 I doubt IPv4 would magically work then. :wink:
I’ll try, but it’s no “proof” my IPv6-theory is wrong. Though it may certainly be totally wrong, of course. :wink:

Okay, tried it with the current next stream, as this:

sudo coreos-installer install /dev/sdd -i ignition/[…].ign --image-file downloads/fedora-coreos-33.20210426.2.0-metal.x86_64.raw.xz --delete-karg 'console=ttyS0,115200n8' --append-karg 'ipv6.disable=1’

And indeed, something (different) happens! NetworkManager seems to get into a kind of infinite loop while trying to acquire a lease?

It seems to repeat this log part in a loop:

May 04 19:21:34 localhost NetworkManager[7348]: <info>  [1620156094.6528] dhcp4 (enp3s0): canceled DHCP transaction
May 04 19:21:34 localhost NetworkManager[7348]: <info>  [1620156094.6528] dhcp4 (enp3s0): state changed bound -> done
May 04 19:21:34 localhost NetworkManager[7348]: <info>  [1620156094.6528] device (enp3s0): DHCPv4: trying to acquire a new lease within 90 seconds
May 04 19:21:34 localhost NetworkManager[7348]: <info>  [1620156094.6528] manager: NetworkManager state is now CONNECTED_SITE
May 04 19:21:34 localhost NetworkManager[7348]: <info>  [1620156094.6529] exiting (success)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2417] NetworkManager (version 1.26.8-1.fc33) is starting... (after a restart)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2417] Read config: /etc/NetworkManager/NetworkManager.conf
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2429] auth[0x55ea8a7c3de0]: create auth-manager: D-Bus connection not available. Polkit is disabled and only root will be authorized.
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2434] manager[0x55ea8a7d6020]: monitoring kernel firmware directory '/lib/firmware'.
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2435] hostname: hostname: hostnamed not used as proxy creation failed with: Could not connect: No such file or directory
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2435] dns-mgr[0x55ea8a7ce2c0]: init: dns=default,systemd-resolved rc-manager=symlink
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2453] manager[0x55ea8a7d6020]: rfkill: Wi-Fi hardware radio set enabled
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2453] manager[0x55ea8a7d6020]: rfkill: WWAN hardware radio set enabled
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2460] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.26.8-1.fc33/libnm-device-plugin-team.so)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2460] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2460] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2460] manager: Networking is enabled by state file
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2460] dhcp-init: Using DHCP client 'internal'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2461] settings: Loaded settings plugin: keyfile (internal)
May 04 19:21:35 localhost NetworkManager[7372]: <warn>  [1620156095.2461] settings: could not load plugin 'ifcfg-rh' from file '/usr/lib64/NetworkManager/1.26.8-1.fc33/libnm-settings-plugin-ifcfg-rh.so': No such file or directory
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2469] device (lo): carrier: link connected
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2469] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2471] device (enp3s0): carrier: link connected
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2472] manager: (enp3s0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2473] manager: (enp3s0): assume: will attempt to assume matching connection 'Wired Connection' (554c3826-5966-444c-8493-74eff4815cf9) (indicated)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2474] device (enp3s0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2480] device (enp3s0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2483] device (enp3s0): Activation: starting connection 'Wired Connection' (554c3826-5966-444c-8493-74eff4815cf9)
May 04 19:21:35 localhost NetworkManager[7372]: <warn>  [1620156095.2484] sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2485] device (enp3s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2485] device (enp3s0): state change: prepare -> config (reason 'none', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2485] device (enp3s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2486] dhcp4 (enp3s0): activation: beginning transaction (timeout in 45 seconds)
May 04 19:21:35 localhost NetworkManager[7372]: <error> [1620156095.2498] device (enp3s0): addrconf6: failed to start neighbor discovery: failure creating libndp socket: Address family not supported by protocol (97)
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option dhcp_lease_time      => '43200'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option domain_name          => 'lan'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option domain_name_servers  => '192.168.1.1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option expiry               => '1620199295'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option ip_address           => '192.168.1.248'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option next_server          => '192.168.1.1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option requested_broadcast_address => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option requested_domain_name => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option requested_domain_name_servers => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option requested_domain_search => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2581] dhcp4 (enp3s0): option requested_host_name  => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_interface_mtu => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_ms_classless_static_routes => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_nis_domain => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_nis_servers => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_ntp_servers => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_rfc3442_classless_static_routes => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_root_path  => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_routers    => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_static_routes => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_subnet_mask => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_time_offset => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option requested_wpad       => '1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option routers              => '192.168.1.1'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): option subnet_mask          => '255.255.255.0'
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2582] dhcp4 (enp3s0): state changed unknown -> bound
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2587] device (enp3s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2588] device (enp3s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2588] device (enp3s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'assume')
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2589] manager: NetworkManager state is now CONNECTED_LOCAL
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2591] manager: NetworkManager state is now CONNECTED_SITE
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2591] policy: set 'Wired Connection' (enp3s0) as default for IPv4 routing and DNS
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2594] device (enp3s0): Activation: successful, device activated.
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2594] manager: NetworkManager state is now CONNECTED_GLOBAL
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2595] manager: startup complete
May 04 19:21:35 localhost NetworkManager[7372]: <info>  [1620156095.2595] quitting now that startup is complete

After waaay tooo long trying this endless loop :smiley: , it gave up and showed me an emergency shell.

Then I could extract all the logs again.

But now I need to split this post, because Discourse limits the amount of characters… :see_no_evil:

So to make them more readable, here they go:

ip addr show
$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:******:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.248/24 brd 192.168.1.255 scope global dynamic noprefixroute enp3s0
       valid_lft 43004sec preferred_lft 43004sec
And some more starting lines of rdsosreport.txt with some information
+ cat /lib/dracut/dracut-050-64.git20200529.fc33
dracut-050-64.git20200529.fc33
+ cat /proc/cmdline
+ sed -e 's/\(ftp:\/\/.*\):.*@/\1:*******@/g;s/\(cifs:\/\/.*\):.*@/\1:*******@/g;s/cifspass=[^ ]*/cifspass=*******/g;s/iscsi:.*@/iscsi:******@/g;s/rd.iscsi.password=[^ ]*/rd.iscsi.password=******/g;s/rd.iscsi.in.password=[^ ]*/rd.iscsi.in.password=******/g'
BOOT_IMAGE=(hd0,gpt3)/ostree/fedora-coreos-4f9918995641f54d7d0b42584fc97958ab5a23601a9314a1234b442ab099f7a9/vmlinuz-5.11.15-200.fc33.x86_64 mitigations=auto,nosmt console=tty0 ignition.platform.id=metal ostree=/ostree/boot.0/fedora-coreos/4f9918995641f54d7d0b42584fc97958ab5a23601a9314a1234b442ab099f7a9/0 ipv6.disable=1 rd.luks.name=6f14f5ac-d2f0-4f08-af23-9f8c5ff8ebe1=root rd.neednet=1 rd.luks.options=_netdev root=UUID=6a8d1969-fbab-4cf0-9d80-b277343fb048 rw
+ '[' -f /etc/cmdline ']'
+ for _i in /etc/cmdline.d/*.conf
+ '[' -f /etc/cmdline.d/00-btrfs.conf ']'
+ echo /etc/cmdline.d/00-btrfs.conf
/etc/cmdline.d/00-btrfs.conf
+ cat /etc/cmdline.d/00-btrfs.conf
+ sed -e 's/\(ftp:\/\/.*\):.*@/\1:*******@/g;s/\(cifs:\/\/.*\):.*@/\1:*******@/g;s/cifspass=[^ ]*/cifspass=*******/g;s/iscsi:.*@/iscsi:******@/g;s/rd.iscsi.password=[^ ]*/rd.iscsi.password=******/g;s/rd.iscsi.in.password=[^ ]*/rd.iscsi.in.password=******/g'
rd.driver.pre=btrfs
+ cat /proc/self/mountinfo
1 1 0:2 / / rw shared:1 - rootfs none rw
23 1 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
24 1 0:23 / /proc rw,nosuid,nodev,noexec,relatime shared:7 - proc proc rw
25 1 0:5 / /dev rw,nosuid,noexec shared:8 - devtmpfs devtmpfs rw,size=32824792k,nr_inodes=8206198,mode=755,inode64
26 23 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:3 - securityfs securityfs rw
27 25 0:24 / /dev/shm rw,nosuid,nodev shared:9 - tmpfs tmpfs rw,inode64
28 25 0:25 / /dev/pts rw,nosuid,noexec,relatime shared:10 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
29 1 0:26 / /run rw,nosuid,nodev shared:11 - tmpfs tmpfs rw,size=13146140k,nr_inodes=819200,mode=755,inode64
30 23 0:27 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:4 - cgroup2 cgroup2 rw,nsdelegate
31 23 0:28 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:5 - pstore pstore rw
32 23 0:29 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:6 - bpf none rw,mode=700
+ cat /proc/mounts
none / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,noexec,size=32824792k,nr_inodes=8206198,mode=755,inode64 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=13146140k,nr_inodes=819200,mode=755,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
none /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0

The rest did not fit into this Discourse posts, because it limits the amount of chars I can put in here, so please find them in this GitHub gist:
Overview over logs related to network CoreOS issue on Purism Librem Mini · GitHub

BTW the device is the Purism Librem Mini With SeaBIOS as the above log also shows.

Again if someone wants to get the full log feel free to ask. Or I may also open a bug report at RedHat’s bugzilla if that would help – I could limit the report’s visibility to a smaller group of Fedora contributors there.

BTW, maybe a good idea to mention the network setup: It’s just an OpenWRT router behind another router (no bridge, they have different subnets), and for what matters, it’s “usual” DHCP… Any other device on that network (WiFi or LAN) could so far always acquire a IP address…

And as for the Ignition/CoreOS config, the only thing related to network I define/configure there is the hostname, just exactly as the docs tell me:

storage:
  files:
    # hostname
    - path: /etc/hostname
      mode: 0644
      contents:
        inline: # some [a-zA-Z]* chars 

So nothing special. And no reason to fail… :thinking:

Okay, so as @sampsonf and @vgaetera suggested over at AskFedora](No network (IPv4?) connectivity via Ethernet at first boot for LUKS/Clevis setup - #2 by sampsonf - on installing or upgrading Fedora - Ask Fedora):

…I did try it with the next stream now:

$ sudo coreos-installer install /dev/sdd -i ignition/fcc-minipure.ign --image-file downloads/fedora-coreos-34.20210427.1.0-metal.x86_64.raw.xz --delete-karg 'console=ttyS0,115200n8' 
Copying image from downloads/fedora-coreos-34.20210427.1.0-metal.x86_64.raw.xz
Reading signature from downloads/fedora-coreos-34.20210427.1.0-metal.x86_64.raw.xz.sig
gpg: Signatur vom Di 27 Apr 2021 22:09:27 CEST
gpg:                mittels RSA-SchlĂĽssel 8C5BA6990BDB26E19F2A1A801161AE6945719A39
gpg: Korrekte Signatur von "Fedora (34) <fedora-34-primary@fedoraproject.org>" [ultimativ]
> Read disk 566.5 MiB/566.5 MiB (100%)   
Writing Ignition config
Modifying kernel arguments
Install complete.

Booting…
And… voila… it does indeed boot without any problem!

However, is this really a solution? I mean next is unstable and… no one pointed me to an intentional fix for that, that would tackle it. (Maybe it only works, because of strange coindcidence?)

The only thing, which has been confirmed on GitHub is that Ignition will not wait for a network before trying the Clevis command/contact the tang server in the stable and testing stream.
However, that should not fully solve my problem, because well… there is and was no network even after waiting. :thinking: (Or was it some kind of race condition?)

I’d still like to get an explanation for that…
But thanks anyone for helping so far, already, in any case.

1 Like

Okay, bad message again.

After one reboot, teh decryption already fails again (with the same error?).

It now continues to show:

… clevis-luks-askpass[…]: Error communicating with the server!

Okay, I admit I did not only reboot, but did one thing, but that was just layering… really… rpm-ostree install nano. That’s it…

Although the previous (initial) boot now worked and was just seconds ago. Also I rechecked the tang server is still available and receives no request this time.
Also i rebooted over SSH so it is clearly has network connection… physically/layer-1’ie at least… :roll_eyes:

And CoreOS just loops there and desperately tries to do connect to the tang server.
BTW, as CoreOS’ aim is to be failsafe especially for reboots and such stuff, does it automatically stop this loop and reboot or so at some time? (Could be some hardware failure… “fixed” by a reboot maybe?? – In my case it is not, but well…)


Edit: Now I force-powered it off manually (physically), and suddenly the next boot works again???

This is really strange behaviour…

And it IMHO calls for a “reboot when network/LUKS decryption fails after 5min” or so as a feature, does not it?

1 Like

A lot to unpack here :slight_smile:

At least we were able to find some success, but I agree that not knowing what the root cause is unsettling. The difference between stable and next right now is quite large because there is a F34 rebase in there. So you’ve got pretty much newer versions of the entire software stack (NetworkManager, dracut, etc…).

If it makes you feel any better, the exact content in next is getting promoted to testing tomorrow and will be in stable in a few weeks time.

Apologies for seeing this this late. Tang-pinning the rootfs doesn’t work in Fedora 33 because it’s missing patches in dracut and systemd. As you’ve noticed this is fixed in Fedora 34.

See: Tang-pinned rootfs doesn't work in FCOS · Issue #692 · coreos/fedora-coreos-tracker · GitHub

1 Like

Anyway thanks a lot!

I reported the issue about the reboot that would have helped me (it was in Fedora 34, see the thread further up) here:

I’m referring to this:

Sign…

Oh yeah, it seems even the current Ignition version in fedora-coreos-34.20210503.1.0 (currently next) is still v2.9.0 and only v2.10.1 includes a fix for that problem: