Periodic and frequent freezes when copying large files on NVME

Hello.

My laptop with an NVME happily runs Fedora workstation. However, when I’m copying a huge quantity of data from an external hard drive to the NVME I experience freezes alternating to a normally responsive system. After I launch a copy of a big quantity (some GBs), this occurs in order:

  1. copy starts at full speed for ~10s
  2. copy freezes for around 10 to 15s.
  3. meanwhile, the system freezes, I cannot open the terminal, nor type commands, nor interact with windows or applications
  4. after some seconds, the freeze goes away and the system responds as usual. The copy keeps going on at full speed
  5. go to 1, until everything has been successfully copied.

This might not be a problem if I leave my computer alone when I’m copying huge files, but when I’m using the computer it prevents me to do basic things like switching applications and opening a terminal.

What can I do in order to prevent this? Is this solvable, or is it my nvme’s fault?

It seems likely that either something is using a lot of cpu time or the system is using a lot of memory and has to free up RAM so the buffering can resume. My guess would be either a smaller amount of RAM than required or something using a lot of RAM. The buffer seems to need to fill then empty before it can fill again.

Almost every process that involves large quantities of data data being transferred uses RAM buffering to smooth out the intermittent reads and writes on devices. The transfer always starts fast until the buffer fills then smooths out to the actual write speed. The fact that it slows other actions on the system may be swap, inadequate ram, or processes filling the ram that is needed for normal operation.

It very seldom would seem related to an nvme unless you have both ends as nvme then it would seem the data link between as the data transfer restriction but that still should not interfere with other system operations as you describe on the desktop.

How much ram do you actually have installed?
Would you please provide the output of inxi -Fzxx so we can see the hardware config involved.

Also the output of free while doing a large transfer with this symptom occurring

1 Like

why does the nvme ssd transfer speed decrease while transfering large data

What filesystem? Use smartmontools to check that the nvme isn’t failing or overheating. There may be relevant messages in journalctl or dmesg. If these don’t provides hints to the source of the problem, try running a “top” utility (bpytop is a good example).

Here is my hardware info

System:
  Kernel: 6.1.7-200.fc37.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.38-25.fc37 Desktop: GNOME v: 43.2 tk: GTK v: 3.24.36 wm: gnome-shell
    dm: GDM Distro: Fedora release 37 (Thirty Seven)
Machine:
  Type: Laptop System: HP product: HP Pavilion Laptop 14-dv0xxx
    v: Type1ProductConfigId serial: <superuser required> Chassis: type: 10
    serial: <superuser required>
  Mobo: HP model: 87C9 v: 34.27 serial: <superuser required> UEFI: Insyde
    v: F.14 date: 05/06/2021
Battery:
  ID-1: BAT0 charge: 36.1 Wh (100.0%) condition: 36.1/36.1 Wh (100.0%)
    volts: 12.7 min: 11.6 model: HP Primary serial: <filter> status: full
CPU:
  Info: quad core model: 11th Gen Intel Core i7-1165G7 bits: 64 type: MT MCP
    arch: Tiger Lake rev: 1 cache: L1: 320 KiB L2: 5 MiB L3: 12 MiB
  Speed (MHz): avg: 1997 high: 2800 min/max: 400/4700 cores: 1: 1210 2: 2800
    3: 1300 4: 2800 5: 1181 6: 2800 7: 1086 8: 2800 bogomips: 44851
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] vendor: Hewlett-Packard
    driver: i915 v: kernel arch: Gen-12.1 ports: active: eDP-1 empty: DP-1,
    DP-2, DP-3, DP-4, HDMI-A-1 bus-ID: 0000:00:02.0 chip-ID: 8086:9a49
  Device-2: Luxvisions Innotech HP Wide Vision HD Camera type: USB
    driver: uvcvideo bus-ID: 3-3:3 chip-ID: 30c9:000e
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.7
    compositor: gnome-shell driver: gpu: i915 display-ID: 0
  Monitor-1: eDP-1 model: ChiMei InnoLux 0x14ff res: 1920x1080 dpi: 158
    diag: 354mm (13.9")
  API: OpenGL v: 4.6 Mesa 22.3.3 renderer: Mesa Intel Xe Graphics (TGL GT2)
    direct render: Yes
Audio:
  Device-1: Intel Tiger Lake-LP Smart Sound Audio vendor: Hewlett-Packard
    driver: sof-audio-pci-intel-tgl bus-ID: 0000:00:1f.3 chip-ID: 8086:a0c8
  Sound API: ALSA v: k6.1.7-200.fc37.x86_64 running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.64 running: yes
Network:
  Device-1: Intel Wi-Fi 6 AX201 driver: iwlwifi v: kernel bus-ID: 0000:00:14.3
    chip-ID: 8086:a0f0
  IF: wlp0s20f3 state: up mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Intel Bluetooth 9460/9560 Jefferson Peak (JfP) type: USB
    driver: btusb v: 0.8 bus-ID: 3-10:5 chip-ID: 8087:0aaa
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
RAID:
  Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
    v: 0.6 bus-ID: 0000:00:0e.0 chip-ID: 8086:9a0b
Drives:
  Local Storage: total: 2.29 TiB used: 817.25 GiB (34.9%)
  ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNW512G8H size: 476.94 GiB
    speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 41.9 C
  ID-2: /dev/sda type: USB vendor: Seagate model: Expansion+ size: 1.82 TiB
    serial: <filter>
Partition:
  ID-1: / size: 466.8 GiB used: 357.05 GiB (76.5%) fs: ext4 dev: /dev/dm-1
    mapped: fedora_localhost--live-root
  ID-2: /boot size: 973.4 MiB used: 227.1 MiB (23.3%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 17.4 MiB (2.9%) fs: vfat
    dev: /dev/nvme0n1p1
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 832.8 MiB (10.2%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 44.0 C mobo: N/A
  Fan Speeds (RPM): cpu: 0 fan-2: 0
Info:
  Processes: 321 Uptime: 1h 25m Memory: 15.39 GiB used: 3.7 GiB (24.0%)
  Init: systemd v: 251 target: graphical (5) default: graphical Compilers:
  gcc: 12.2.1 clang: 15.0.7 Packages: pm: rpm pkgs: N/A note: see --rpm
  Shell: Zsh v: 5.9 running-in: tilix inxi: 3.3.24

and this is the output of free during high IO workloads

free
               total        used        free      shared  buff/cache   available
Mem:        16139332     3086736      211736      780544    12840860    11872092
Swap:        8388604      897280     7491324

The copy was from /dev/sda1 to /dev/nvme0n1.

I have Fedora currently installed on an LVM+ext4 with LUKS. I also checked both dmesg and journalctl, but no useful message seems to be present.

1 Like

I see a few potential factors there.

  1. The use of luks, which does require a bit more processing by the kernel during transfers.
    I do not suspect this as a major part since only 4 of the 8 cpus show more than a very low speed and even they are showing only slightly over 50% of full speed.

  2. The use of raid on the device, using the built-in bios raid.
    I am unsure of this, but would consider it as potentially a major factor with disk IO. Raid is known to have an affect on write speeds, and since it is bios/intel/hardware raid and not a linux raid the OS has little control over what it does beyond the driver to enable it. With a single disk and no established raid0 or raid1 parameters it seems totally useless.

RAID:
  Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
    v: 0.6 bus-ID: 0000:00:0e.0 chip-ID: 8086:9a0b

This may be a simple option to enter the bios and set the sata devices to be AHCI vs RAID which should disable the raid management within the bios.

  1. The use of zsh.
    Not likely, but it is different than bash. The shell is not a significant factor in data transfer.

  2. Memory. 16GB available and the free command shows little used as does inxi (<25%). As such that mostly rules out the potential memory issue.

1 Like

What’s the output of … ?

ls -la  /sys/block/[v,s,n]*/queue/scheduler
cat /sys/block/[v,s,n]*/queue/scheduler

I usually use for Intel ssd’s:

usefull commands:

list all devices:    sudo sst show -ssd

Data Integrity:      sudo sst start -ssd 1 -scan      [(DataIntegrity|ReadScan|Logs)]
                           
smart selftest:      sudo sst start -ssd 1 -selftest [short|extended]
smart selftest done: sudo sst show -ssd 1 -selftest 
smart Werte:         sudo sst show -ssd 1 -smart

health/hours:        sudo sst show -ssd 1 -sensor 
LBA written:         sudo sst show -ssd 1 -performance
Wearout Indicator:   sudo sst show -ssd 1 -smart E9 

Device Idenify:      sudo sst show -ssd 1 -identify  
Drive Info/Feature:  sudo sst show -ssd 1 -a 

Note:
I use index=1 here in the above commands (… -ssd 1).
the correct index comes from the first command (list all…) !

I would also check the external drive with smartmontools (maybe via Gnome disk) an/or e2fsck (if ext)

you could run “sudo journalctl -f” in another terminal during copying to see what’s going on.

sudo fstrim -va might help too

I would also try to find out if the NVme degrades write speed when the disk is filled.
I always use ~10% Over-Provisioning on my ssd’s (don’t know if this is currently necessary at all this days) and from time to time a “secure erase”

Samsung and from 2019 though:

I guess you would be better off with brtfs and it’s compression to conserve the NVMe.
and of cause for the upcoming F38’s “RPM Copy on Write”:
https://fedoraproject.org/wiki/Changes/RPMCoW

1 Like

What do you call ‘large file’ … NVME is specific hardware memory being non-volatile so fast as things just stay if not changed not needing to be reloaded. Ext4 is a journal file system needing writing backup data. Try SSD, probably faster for what you are doing, try XFS file system that doesn’t journal or JFS. BTRFS has random ENOSPC data loss so might not fit for you …

Following advice from the official Telegram group, I was asked to switch to bfq scheduler for the NVMe to assure a fairer use of IO resources during high copy workloads. This helped a lot, although sometimes it still freezes if performing multiple copy tasks together. To switch to bfq I created the following file with content

/etc/udev/rules.d/99-ioschedulers.rules

ACTION=="add|change", KERNEL=="nvme*|sd*|mmcblk*|vd*", ATTR{queue/scheduler}="bfq"

and rebooted.

-rw-r--r--. 1 root root 4096 27 gen 20.56 /sys/block/nvme0n1/queue/scheduler
mq-deadline kyber [bfq] none

Originally, it was [none], but after switching to bfq it is modified.

Sadly, I cannot see any of such options in my UEFI BIOS.

On mine it was under Advanced → Sata Configurations

I have AMI Bios.

The reason can be this bug:

https://groups.google.com/g/linux.kernel/c/dPnf2Z_PrUM

I’ve fixed it with this:

vm.dirty_bytes = 134200000

vm.dirty_background_bytes = 67100000

also tuning

vm.dirty_expire_centisecs or

vm.dirty_writeback_centisecs

can help.

I have this values now:

cat /etc/sysctl.d/99-sysctl.conf

vm.dirty_background_bytes = 67100000
vm.dirty_background_ratio = 0
vm.dirty_bytes = 134200000
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

You can try with dirty_writeback_centisecs set to low numbers, like 1-10, it can help.

Setting background bytes and dirty bytes for 64 and 128 mb will solve the issue without noticeable performance degradation.

I have 128 gigs of ram and copying 100 gigs file was entirely freezing pc for 10 minutes completely before tuning

1 Like

Even months after, I really would like to thank you.

I have been lately migrating from LVM to BTRFS, and I encountered the same problem with the new installation. I have been able to fine-tune my config thanks to you without modifying my scheduler from none to bfq. This led to less freezing and less hiccups during my GNOME experience. I guess this is just my laptop’s fault, but I have been able to overcome this issue of mine.

Thank you all

I really wish more distros would set these vm.dirty_bytes and vm.dirty_background_bytes values out of the box.

By default, these values are percentage based on your ram, with dirty_background_ratio being 10% and dirty_ratio being 20%. The problem you had originally with the freezing is due to that, since those default values were made with 32-bit systems and 1gb of ram in mind, and need to be changed according to Linus Torvalds, but 10 years later they are still the same defaults.

The only distro I know of that defaults to sane values for these is Pop!_OS

The slowdowns get worse with the more ram you have, unless dirty_bytes and dirty_background_bytes are changed, which override dirty_ratio and dirty_background_ratio. I have to make this change on every Linux install with 64gb of ram to avoid 5+ minutes of complete freezing on large file transfers or downloads.

1 Like