CoreOS plans for measured boot

Greetings
I booted a CoreOS VM with a butane config containing

boot_device:
  luks:
    tpm2: true

and exposing the TPM2 device of the host (so, the physical TPM of my laptop). It works great! But… it should not, as there is nothing resetting the TPM2 registers between reboots, the VM starts with different values every time, and there is no proper measure to use.

Looking at the createLuks operation logged by ignition I read

ignition[858]: disks: createLuks: op(5): executing: "clevis" "luks" "bind" "-f" "-k" "/tmp/ignition-luks-775476344" "-d" "/run/ignition/dev_aliases/dev/disk/by-partlabel/root" "sss" "{\"pins\":{\"tpm2\":{}},\"t\":1}"

as no pcr_ids is provided, no measured boot register is used.
On one side, this is great because it means I can activate the luks feature without fearing that an update of firmware or shim breaks the measurement (and it works on my laptop). On the other hand it would be great is measured boot could be used so that it gets more difficult to decrypt the data.

The case I am trying to address is someone having physical access to a server, rooting it and decrypting the data. Is there any plan to make better use of the TPM?

Another use case I had in mind (but is more related to OKD/OpenShift) is the generation of the csr that needs to be manually approved today. Ideally the csr could be generated using the TPM so that we can automate the approval process.

Cheers

If you want to bind your secret to specific PCRs values, you will have to manually specify them in your Butane config. See:

2 Likes

One question per thread please! This is likely an RFE for OKD and not specifically Fedora CoreOS.

Thanks! I see now it should be possible to use a PCRs but not by using the boot_device shortcut. I don’t want to appear excessively lazy, but… do you know of a full example that use the boot_device together with the filesystems luks objects? I tried (briefly…) to adapt the existing examples but I did not manage so far (I will post again if I succeed though)

My question was a bit broad, because even with the ability to bind to PCRs it’s not necessarily possible to do this in practice. For example the cmdline parameters are hashed into PCR8. But I cannot bind to PCR8 as the cmdline contains the ostree version and the kernel version, so after a reboot the luks filesystem would never automatically unlock. And not binding to PCR8 means it’s trivial to add a “systemd.debug-shell=1” to the command line.

In addition to that there is a risk that the system does not boot after an update and so you still would want a passphrase just in case.

I could get started by looking at a generated ignition file. This could possibly work

variant: fcos
version: 1.4.0
storage:
  filesystems:
    - device: /dev/mapper/root
      format: xfs
      label: root
      wipe_filesystem: true
  luks:
    - clevis:
        custom:
          needs_network: false
          pin: tpm2
          config: '{"pcr_bank":"sha1","pcr_ids":"0,7"}'
      device: /dev/disk/by-partlabel/root
      label: luks-root
      name: root
      wipe_volume: true

I am getting an error
/usr/bin/clevis-encrypt-tpm2: line 186: tpm2_createpolicy: command not found

EDIT: tpm2_createpolicy: command not found when using custom clevis config · Issue #1255 · coreos/fedora-coreos-tracker · GitHub

Hi Francois,

As you’ve realized, binding to PCR8 will not work because the kernel arguments regularly change during updates. Supporting this would require a non-trivial amount of work on the Fedora CoreOS side.

Note however that one can enable GRUB password protection. This will allow you to require a password to change kernel arguments from the GRUB menu (or booting any non-default deployment).

However, it is not as secure as PCR8 binding because it’d still be possible for someone to e.g. boot on the same host from a USB key into the deployment using custom kernel arguments. Is this a legitimate concern for your use case of Fedora CoreOS? Note that Tang binding might be a better fit for your threat model.

Hello. I’ve been thinking about having some other PCR that would be resilient to updates
(just an idea! Re: Suggestion: Use a unified kernel image by default in the future. - devel - Fedora Mailing-Lists . Assuming someone implement this, you would still need to prevent boots on any existing grub versions. And it’s some work of course).

I consider trivial for someone to plug a USB key, or to unplug the disks, modify the unencrypted data, and plug them back in, so there is no real added value to the GRUB password protection. It is less trivial for someone to steal the whole server with the disks, which is the case where Tang would help I think? Tang relies on a network connection from a server to a tang server to authorize the decryption, in the datacenter the decryption will always be authorized from known servers.

And technically I will be running a few servers with Fedora CoreOS and more servers with RHCOS. We are buying a couple of TPM and I was thinking of making good use of them, so it’s more driven by personal hobby pet project than a need to meet any compliance or regulations (I would not be on this forum).

I’ve been checking the feasibility of this idea, measuring the grub commands and kernel command line into PCR10 during boot, and adding an “extend” command to grub so that we can bring the PCR into a known value during grub execution, before executing the actual grub commands. This also makes it possible, before a server reboot, to bring the PCR into its future value and bind a decryption key against it. However I realize I am lacking a mechanism: is it possible to change the binding after initial setup?

In luks1 it was possible to dump the master key, but this is not possible anymore with luks2. With such a mechanism (just updating the binding) it could be also possible to

  • bind the decryption to tang and tpm2 (with pcr 0+7+8), with a threshold of 1. We would start the tang server only during updates. When no update is being applied to the server, there is no reason for the PCR to change so the decryption can use tpm2. During an update window, we start tang, tang is used for decryption, and then we can recompute the binding after servers booted.
  • another possibility is to bind against no pcr just before upgrade, then bind against the new pcr values after an upgrade is applied.

This would reduce the “time window” during which the servers are vulnerable.

One very notable thing here is GitHub - fedora-iot/clevis-pin-tpm2: Rewritten Clevis TPM2 PIN is shipped by Fedora IoT.

It’d be good to share code and strategy here. (Unfortunately right now IoT doesn’t use Ignition, which hampers this)