Fedora does not recognize Static NICs at boot (before LUKS) please help fix

Hello lovely Fedora Hivemind <3

tl;dr:
after adding a NIC with a Static Setup nm-online-wait times out and the NIC will never come online at boot which breaks clevis/tang. (NIC works after boot tho).

i have a setup where my systems have two NICs.

NIC1 is a “normal” network with DHCP etc.
NIC2 is a “blank” network where every system has a static setup.

Here is a bit of story for context.

Now i come to love Fedora based systems and after Atomic i discovered CoreOS and currently try to set up my server VMs with it.

Due to LUKS working mainly with Clevis/Tang (LUKS Encryption gets open by reaching a PIN Server) i set up with a TANG server.

this setup all works well.

but for my target setup the VMs only connect for management purposes to NIC1.
all production traffic is in NIC2.
This is also the case for the final Tang server. So it is important that NIC2 is working at the time the boot process reaches LUKS (or at least some time after it arrived there).
Sadly that never happens.

thanks to the lovely folks of the CoreOS Matrix chat i managed to narrow down the problem a bit.

As soon as a second NIC is added which has no DHCP in the Network the nm-online-wait times out while booting (takes 90 seconds).

this is the case with a liveboot iso which is ok since NIC2 has no configuration yet.
but even after configuring NIC2 on a fresh install the problem persists until the boot is finished. Therefor my LUKS never works since it cant reach its Tang server.
If i enter my backup passphrase manually the NIC2 works well.
I can connect to other systems mount NFS shares and reach/setup the Tang server.

Its only at boot before LUKS that fedora is not able to use NIC2 :frowning:

After a lot of try and error i currently am using those two .nmconnection configs:

connection]
id=ens18
uuid=502c72e8-1f41-402f-ad0d-f2cc4dc236e0
type=ethernet
autoconnect-priority=-99
interface-name=ens18

[ethernet]

[ipv4]
method=auto

[ipv6]
addr-gen-mode=default
method=disabled

[proxy]

&

connection]
id=ens19
uuid=887e084e-1054-4037-aa49-46e654af457d
type=ethernet
autoconnect-priority=-99
interface-name=ens19
timestamp=1775119188

[ethernet]

[ipv4]
address1=10.0.0.17/24
gateway=10.0.0.1
method=manual

[ipv6]
addr-gen-mode=default
method=disabled

after the boot the nmcli looks like this:

$ nmcli
ens19: connected to ens19
        "Red Hat Virtio"
        ethernet (virtio_net), BC:24:11:23:7C:1C, hw, mtu 1500
        ip4 default
        inet4 10.0.0.17/24
        route4 default via 10.0.0.1 metric 100
        route4 10.0.0.1/32 metric 0
        route4 10.0.0.0/24 metric 100

ens18: connected to Wired Connection
        "Red Hat Virtio"
        ethernet (virtio_net), BC:24:11:37:17:EE, hw, mtu 1500
        inet4 10.10.50.39/24
        route4 10.10.50.0/24 metric 101
        route4 default via 10.10.50.1 metric 101
        route4 10.10.50.1/32 metric 0
[...]

i also tested it on fedora server and fedroa workstation in the same VM.
all with the same result. tho the symptoms differ a bit.
fedora ws and fedora server directly show the manual LUKS prompt for the passphrase while the nm-online-wait timeout is not shown. but if the passphrase is entered the timeout still can be seen while the boot progresses.
CoreOS waits until timeout to show the LUKS prompt.

if anyone has ideas how this can be fixed i would be very great full !

with kind regards

In my limited experience if the NIC does not come up so that clevis can use it there has been missing clevis modules for dracut to use.

I don’t use NM on my systems that use clevis, they are all systemd-networkd configured. This is the list of clevis RPMs I have maybe check for the NetworkManager versions and that you have them all installed. Remember to rebuild the initramfs on any changes.

What I have for systemd-networkd clevis:

clevis-pin-tpm2-0.5.3-10.fc43.x86_64
clevis-21-12.fc43.x86_64
clevis-luks-21-12.fc43.x86_64
clevis-systemd-21-12.fc43.x86_64
clevis-dracut-21-12.fc43.x86_64

thank you for your help <3

$ rpm -qa | grep clevis
clevis-21-12.fc43.x86_64
clevis-pin-tpm2-0.5.3-10.fc43.x86_64
clevis-luks-21-12.fc43.x86_64
clevis-systemd-21-12.fc43.x86_64
clevis-dracut-21-12.fc43.x86_64

the setup is working as intended as long as i place a tang server on the NIC1 network but sadly this is not a permanent solution :frowning:

Do you see logs showing that both NIC’s come up?

When debugging clevis in initramfs I have unpacked the initramfs and read the scripts for clues about what is going on.

Specifically for the systemd-networkd case I had to add a dracut conf (not your issue as you are using NetworkManager) but I’m document here in case others read this topic in the future.

$ cat /etc/dracut.conf.d/unlock-disk-over-network.conf
add_dracutmodules+=" systemd-networkd "
1 Like

you brought me an idea.
since i lack the knowledge how to check i only deducted that the second NIC will not initialize :see_no_evil_monkey:

on debian there is a GRUB CMD Line eddit nessesarry with multi NICs and clevis
for my example it would look s.th. like this:

ip=10.0.0.17::10.0.0.1:255.255.255.0::ens19: reference

there even was a clevis issue for this

1 Like

ok i tried:
$ rpm-ostree kargs editor --append=IP=10.0.0.17::10.0.0.1:255.255.255.0::ens19:
but even tho the argument now is shown in grub its either not working correctly or my initial thought that the second NIC is not working (hence the nmwait timeout).

i know its not really comparable but on debian the NIC2 worked and with the grub cmd edit clevis found the correct tang server.

//EDIT: thanks to a very nice person on the bazzite discord who told me that karg and the dracut parsing are case sensitive ive found the error.

if you read this be are and use ip= with reference

Configure NIC2 for DHCP and plug it into your DHCP lan segment to test it?

It’s not clear to me what the problem is since we’re mentioning both tang and nm-wait-online. It would be helpful if we can reduce the number of variables here:

  1. Is the problem that nm-wait-online.service has to timeout before the boot continues?
  2. Is the problem that your tang server can’t be reached?

If 1. is the problem can you remove tang from the equation and also remove one of the NICs from the equation so we can zero in on the actual problem?

Note also: providing a butance configuration file here would help in understanding.

1 Like

I always use the router to assign static ip for all my PC’s and VM’s.

1 Like

thank all you so much for you help <3
i really appreciate it !

that is also the case for NIC1 and also working.
but the production traffic is separated and has no dhcp per design.

i agree with the two possibility’s. also i am not entirely sure anymore which case it might be or at least on coreos if it may be both.

what i have tested is putting the tang server on NIC1 (dhcp) which on CoreOS does not help with the nm-wait timeout.
on fedora server the LUKS promt already comes while nm-wait times out (like its parallel and not serial compared to coreos).
in both scenarios as soon as the LUKS prompt is reached the tang server is also available and LUKS will get automatically unlocked.
(which never happens on the static NIC2).

what i will try in the next days is set up a fedora server and see if i can successfully edit the grub like i did on Debian.

I am aware that fedora server and coreos are not really comparable.
but either way ill report back the results.

Since i read the Clevis and Ubuntu Issues with multiple nics i have a theory.
Maybe even tho the NICs are correctly configured its not automatically synced to the pre unlock state.

So when the system boots in the time before the LUKS unlock both NICs get treated with default configurations.
That would explain NIC1 working since i did not change much from default except disabling IPv6.

with kind regards

//EDIT:
if i should run some commands which could help i will gladly do.
i can also try some other VMs if that helps

What I assume is supposed to happen is that the NetworkManager config for the NIC should be put into the initramfs and used to bring up the NIC.

You can try forcing the initramfs to be rebuilt to be sure that it contains the latest config.
``
sudo dracut --force

Then reboot and see if it works. Make sure you boot without quiet and rhgb command line options so you see all the logs.

you where on the right track (:
i consulted the bazzite discord to ask for the correct way to regenerate initramfs.
that is done automatically when you use rpm-ostree kargs editor and also shown in the console (if one would read).

but thanks to a very kind and helpful person on the bazzite server i learned that karg and the dracut parsing is case sensitive!

so the clevis github issue and the hint from the ubuntu issue regarding it where ultimately correct (:

the problem can be worked around / fixed by using the ip= option and syntax

so for others who might read this later one here is an example:
$ rpm-ostree kargs editor --append-if-missing=ip=10.0.0.17::10.0.0.1:255.255.255.0::ens19:
(which ofc needs to edited for your use case)

1 Like