Full disk encryption performance can be really improved in NVMe and SSD devices

I was having performance issues downloading at 1 Gbps to my fully encrypted 3500 MB/s NVMe, which left the system unresponsive for a second or two, specially with Steam that does CPU and I/O intensive tasks while downloading games, related to decompressing and shaders processing.

I found this article from Cloudflare claiming that dm-crypt queuing is an unnecessary overhead with fast storage: https://blog.cloudflare.com/speeding-up-linux-disk-encryption/

So following this guide dm-crypt/Specialties - ArchWiki I enabled the flags

discard,no-read-workqueue,no-write-workqueue

in my /etc/crypttab and regenerated the initramfs with

$ sudo dracut --regenerate-all --force

This really improved the throughput for me, confirming Cloudflare conclusions and solving the freezing issues I had.

I’m thinking that this setting could be the default in Fedora, can other people test this? What do you think?

2 Likes

Thanks for posting this; I just enabled the workqueue-related flags, but I did it in the LUKS2 header (persistent mode) instead of crypttab.

2 Likes

Do you have a CPU with AES-NI ?

If not, the bottleneck is maybe using AES for disk encryption (it is used by default as most models have it). If the kernel cannot use a hardware-implementation (which is AES-NI), it has to fall back to a software implementation, which is a strong bottleneck (the bugzilla report below contains a comparison as example).

You can check with lscpu or lscpu | grep aes (there has to be a flag “aes”) or by checking your CPU model on the vendor website.

Some more elaboration about that issue:
https://bugzilla.redhat.com/show_bug.cgi?id=2077532

Adiantum is the alternative.

If you have questions about it, let me know.

Thanks for your suggestion, but I have it, it’s a Ryzen 5 3600, I did the benchmark just to confirm AES performance and it’s fine.

➜  ~ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1804777 iterations per second for 256-bit key
PBKDF2-sha256    3449263 iterations per second for 256-bit key
PBKDF2-sha512    1593580 iterations per second for 256-bit key
PBKDF2-ripemd160  800439 iterations per second for 256-bit key
PBKDF2-whirlpool  678250 iterations per second for 256-bit key
argon2i      10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id     10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b      1258.5 MiB/s      4114.0 MiB/s
    serpent-cbc        128b       118.9 MiB/s       740.8 MiB/s
    twofish-cbc        128b       243.6 MiB/s       429.7 MiB/s
        aes-cbc        256b       957.2 MiB/s      3395.9 MiB/s
    serpent-cbc        256b       119.1 MiB/s       741.9 MiB/s
    twofish-cbc        256b       243.4 MiB/s       429.1 MiB/s
        aes-xts        256b      3334.1 MiB/s      3380.0 MiB/s
    serpent-xts        256b       654.6 MiB/s       641.9 MiB/s
    twofish-xts        256b       397.7 MiB/s       396.8 MiB/s
        aes-xts        512b      2829.4 MiB/s      2838.7 MiB/s
    serpent-xts        512b       654.3 MiB/s       643.0 MiB/s
    twofish-xts        512b       397.4 MiB/s       396.6 MiB/s

What scheduler did your system selected?

cat /sys/block/nvme0n1/queue/scheduler

none, the default for NVMe, the first thing I did was trying to change it to bfq and it helped with responsiveness but decreased throughput, changing dm-crypt flags was the best solution overall

I agree. While this is just anecdata, overall performance seems to be better with the flags set, and this is especially noticeable when dnf update is installing a large package like kernel-headers.