Can't combine manual partitioning + LUKS

Hi,

I am trying to combine both manual partitionning and LUKS, all through ignition. Trying both features separately works, but the combination does not, and I fail to understand why.

variant: fcos
version: 1.6.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - XXX
      password_hash: XXX
storage:
  luks:
    - name: root
      label: luks-root
      device: /dev/disk/by-partlabel/root
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true
  filesystems:
    - device: /dev/mapper/root
      format: xfs
      wipe_filesystem: true
      label: root

This works.

variant: fcos
version: 1.6.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - XXX
      password_hash: XXX
storage:
  disks:
    - device: /dev/disk/by-id/coreos-boot-disk
      wipe_table: true
      partitions:
        - label: root
          number: 4
          size_mib: 8192
          resize: true
        - label: var
          size_mib: 0
          resize: true
  filesystems:
    - device: /dev/disk/by-partlabel/root
      label: root
      format: xfs
      wipe_filesystem: true
    - device: /dev/disk/by-partlabel/var
      label: var
      path: /var
      format: btrfs
      with_mount_unit: true
      wipe_filesystem: true

This works.

variant: fcos
version: 1.6.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - XXX
      password_hash: XXX
storage:
  disks:
    - device: /dev/disk/by-id/coreos-boot-disk
      wipe_table: true
      partitions:
        - label: root
          number: 4
          size_mib: 8192
          resize: true
        - label: var
          size_mib: 0
          resize: true
  luks:
    - name: root
      label: luks-root
      device: /dev/disk/by-partlabel/root
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true
    - name: var
      label: luks-var
      device: /dev/disk/by-partlabel/var
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true
  filesystems:
    - device: /dev/mapper/root
      label: root
      format: xfs
      wipe_filesystem: true
    - device: /dev/mapper/var
      label: var
      path: /var
      format: btrfs
      with_mount_unit: true
      wipe_filesystem: true

This fails with the following rdsosreport.txt: https://paste.hostux.net/?d0cf25bd7238e8bc#38r8ie9PjosDZEF2ToWG89uBxmLpEMFHgivftXUr48Sc

The error appears to be Error: System has 0 devices with a filesystem labeled 'boot': []. And indeed, with lsblk I can see that no boot partition was created. No EFI-SYSTEM, no BIOS-BOOT. Only the partitions I have manually created.

The question that I can’t answer is:

Apparently, FCOS creates its default partitions in both working cases. If I do manual partitioning, OR if I encrypt the root with LUKS, it does create the required partitions. But when doing both at the same time, it does not. Why?

Would you have an idea?

Thanks in advance for any answer.

I tested the following Ignition config on a bare metal machine:

Ignition config
variant: fcos

version: 1.6.0

passwd:
  users:
    - name: core
      password_hash: $y$...
      ssh_authorized_keys: [ssh-ed25519 AAA...]

storage:

  disks:
  # The link to the block device the OS was booted from.
  - device: /dev/disk/by-id/coreos-boot-disk
    # We do not want to wipe the partition table
    # since this is the primary device.
    wipe_table: false

    partitions:

    - number: 4
      label: root
      # Allocate at least 10 GiB to the rootfs.
      size_mib: 10240
      resize: true

    - label: var
      size_mib: 0

  # Encrypting filesystems with a TPM2 Clevis pin bound to PCR 7
  luks:
  
    - name: root
      label: root-luks
      device: /dev/disk/by-partlabel/root
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true
      
    - name: var
      label: var-luks
      device: /dev/disk/by-partlabel/var
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true

  # Configuring filesystems
  filesystems:

    - device: /dev/mapper/root
      wipe_filesystem: true
      format: btrfs 
      label: root

    - path: /var
      device: /dev/mapper/var
      format: btrfs
      with_mount_unit: true
      label: var

The resulting partitioning:

lsblk -pfa /dev/sda
core@localhost:~$ lsblk -pfa /dev/sda
NAME                 FSTYPE      FSVER LABEL      UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
/dev/sda                                                                                              
├─/dev/sda1                                                                                           
├─/dev/sda2          vfat        FAT16 EFI-SYSTEM 7B77-95E7                                           
├─/dev/sda3          ext4        1.0   boot       9278b3bd-a07f-4907-913a-cf9bbf780d85  180.3M    42% /boot
├─/dev/sda4          crypto_LUKS 2     root-luks  ed86c63f-533a-4287-a4dc-a7ab019a07f1                
│ └─/dev/mapper/root btrfs             root       82ec3121-93ad-4c9e-b28b-4160b805accf    7.9G    17% /sysroot/ostree/deploy/fedora-coreos/var
│                                                                                                     /sysroot
│                                                                                                     /etc
└─/dev/sda5          crypto_LUKS 2     var-luks   67583381-fb91-4ca3-9ee6-215cbdefde2d                
  └─/dev/mapper/var  btrfs             var        31e2f021-5572-4e00-9d2b-3f296c9b6e1c  220.3G     0% /var

Thanks for your answer.

My initial problem is that I wanted to keep wipe_table: true so that I could just run the ignition file no matter in which state the disk was (used by a previous OS). Over the last few hours, I succeeded to install with the following:

variant: fcos
version: 1.6.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - XXX
      password_hash: XXX
storage:
  disks:
    - device: /dev/disk/by-id/coreos-boot-disk
      wipe_table: true
      partitions:
        - label: efi
          number: 2
          size_mib: 512
          resize: true
          type_guid: "c12a7328-f81f-11d2-ba4b-00a0c93ec93b"
        - label: boot
          number: 3
          size_mib: 1024
          resize: true
        - label: root
          number: 4
          size_mib: 8192
          resize: true
        - label: swap
          number: 5
          size_mib: 2048
          resize: true
        - label: var
          number: 6
          size_mib: 0
          resize: true
  luks:
    - name: root
      label: luks-root
      device: /dev/disk/by-partlabel/root
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true
    - name: var
      label: luks-var
      device: /dev/disk/by-partlabel/var
      clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha256","pcr_ids":"7"}'
      wipe_volume: true
  filesystems:
    - device: /dev/disk/by-partlabel/efi
      format: vfat
      wipe_filesystem: true
    - device: /dev/disk/by-partlabel/boot
      label: boot
      path: /boot
      format: ext4
      wipe_filesystem: true
      with_mount_unit: true
    - device: /dev/mapper/root
      label: root
      format: xfs
      wipe_filesystem: true
    - device: /dev/mapper/var
      label: var
      path: /var
      format: btrfs
      with_mount_unit: true
      wipe_filesystem: true
    - device: /dev/disk/by-partlabel/swap
      label: swap
      format: swap
      wipe_filesystem: true
      with_mount_unit: true

I still fail to understand why, but when both features are combined, manually defining the default partitions is mandatory. I think there is some explainations missing about this in the documentation. This installs and boots, but there is still some problems. For example, the grub configuration is incorrect.

So, I will give up on wipe_table: true and just run sudo sgdisk --zap-all /dev/nvme0n1 manually before doing coreos-installer install. Then, I will start again to iterate on your example.

As I already mentioned, you don’t need to wipe the partition table on the primary device. Ignition always runs on first boot when you (re)provision the node.

You probably don’t need to run sudo sgdisk --zap-all /dev/nvme0n1 as well. Try without it to see if it makes any difference.