I was having performance issues downloading at 1 Gbps to my fully encrypted 3500 MB/s NVMe, which left the system unresponsive for a second or two, specially with Steam that does CPU and I/O intensive tasks while downloading games, related to decompressing and shaders processing.
I found this article from Cloudflare claiming that dm-crypt queuing is an unnecessary overhead with fast storage: https://blog.cloudflare.com/speeding-up-linux-disk-encryption/
So following this guide dm-crypt/Specialties - ArchWiki I enabled the flags
/etc/crypttab and regenerated the initramfs with
$ sudo dracut --regenerate-all --force
This really improved the throughput for me, confirming Cloudflare conclusions and solving the freezing issues I had.
I’m thinking that this setting could be the default in Fedora, can other people test this? What do you think?
Thanks for posting this; I just enabled the workqueue-related flags, but I did it in the LUKS2 header (persistent mode) instead of crypttab.
Do you have a CPU with AES-NI ?
If not, the bottleneck is maybe using AES for disk encryption (it is used by default as most models have it). If the kernel cannot use a hardware-implementation (which is AES-NI), it has to fall back to a software implementation, which is a strong bottleneck (the bugzilla report below contains a comparison as example).
You can check with
lscpu | grep aes (there has to be a flag “aes”) or by checking your CPU model on the vendor website.
Some more elaboration about that issue:
Adiantum is the alternative.
If you have questions about it, let me know.
Thanks for your suggestion, but I have it, it’s a Ryzen 5 3600, I did the benchmark just to confirm AES performance and it’s fine.
➜ ~ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1804777 iterations per second for 256-bit key
PBKDF2-sha256 3449263 iterations per second for 256-bit key
PBKDF2-sha512 1593580 iterations per second for 256-bit key
PBKDF2-ripemd160 800439 iterations per second for 256-bit key
PBKDF2-whirlpool 678250 iterations per second for 256-bit key
argon2i 10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 1258.5 MiB/s 4114.0 MiB/s
serpent-cbc 128b 118.9 MiB/s 740.8 MiB/s
twofish-cbc 128b 243.6 MiB/s 429.7 MiB/s
aes-cbc 256b 957.2 MiB/s 3395.9 MiB/s
serpent-cbc 256b 119.1 MiB/s 741.9 MiB/s
twofish-cbc 256b 243.4 MiB/s 429.1 MiB/s
aes-xts 256b 3334.1 MiB/s 3380.0 MiB/s
serpent-xts 256b 654.6 MiB/s 641.9 MiB/s
twofish-xts 256b 397.7 MiB/s 396.8 MiB/s
aes-xts 512b 2829.4 MiB/s 2838.7 MiB/s
serpent-xts 512b 654.3 MiB/s 643.0 MiB/s
twofish-xts 512b 397.4 MiB/s 396.6 MiB/s
What scheduler did your system selected?
none, the default for NVMe, the first thing I did was trying to change it to
bfq and it helped with responsiveness but decreased throughput, changing dm-crypt flags was the best solution overall
I agree. While this is just
anecdata, overall performance seems to be better with the flags set, and this is especially noticeable when
dnf update is installing a large package like kernel-headers.