Fedora 40 returns Btrfs error with Kernels 6.8.9 and 6.8.10. Boots fine with Kernel 6.8.5

Hi all I am facing problems to boot Fedora 40 with the latest Kernels. I saw many posts about such kind of problems, but to me mine looks a bit different.
In my case I am able to boot with kernel 6.8.5 but not with 6.8.9 nor 6.8.10. I did not test fedora 40 with kernel 6.8.8, but, if I well remember, before upgrading to fedora 40, fedora 39 ran correctly with kernel 6.8.8, but didn´t boot after upgrading to fedora 40. As I decided to try to upgrade via Discover, I addressed to an immature upgrade procedure the problem. So I reinstalled from scratch Fedora 40. My installation is on an external 4GB nvne disk and, after the installation, I reduced the size of the btrfs partition to onlyhalf the disk capacity, to create an NTFS partition to swap data with windows. I checked all my configuration files and I have only this instruction recalling “resume”: GRUB_DISABLE_RECOVERY=“true”
in /etc/default/grub.
As at first I installed fedora without a swap partition, I then created and activated one, but the problem persists.

The output of the inxi command is:

root@LAPTOP-3:/home/andrea# inxi -Fzxx

System:
  Kernel: 6.8.5-301.fc40.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.41-34.fc40
  Desktop: KDE Plasma v: 6.0.5 tk: Qt v: N/A wm: kwin_wayland dm: SDDM
    Distro: Fedora Linux 40 (KDE Plasma)
Machine:
  Type: Convertible System: HP product: HP Spectre x360 2-in-1 Laptop 16-f2xxx
    v: Type1ProductConfigId serial: <filter> Chassis: type: 31 serial: <filter>
  Mobo: HP model: 891F v: 45.25 serial: <filter> part-nu: 7X8T4EA#ABZ
    UEFI: Insyde v: F.14 date: 03/28/2024
Battery:
  ID-1: BAT1 charge: 77.8 Wh (100.0%) condition: 77.8/83.0 Wh (93.7%)
    volts: 12.8 min: 11.6 model: Hewlett-Packard PABAS0241231 serial: <filter>
    status: full
  Device-1: hid-0018:04F3:4035.0001-battery model: ELAN2513:00 04F3:4035
    serial: N/A charge: N/A status: N/A
CPU:
  Info: 12-core (4-mt/8-st) model: 13th Gen Intel Core i7-1360P bits: 64
    type: MST AMCP arch: Raptor Lake rev: 2 cache: L1: 1.1 MiB L2: 9 MiB
    L3: 18 MiB
  Speed (MHz): avg: 552 high: 801 min/max: 400/5000:3700 cores: 1: 727
    2: 400 3: 530 4: 400 5: 400 6: 400 7: 685 8: 400 9: 400 10: 726 11: 400
    12: 754 13: 400 14: 801 15: 670 16: 749 bogomips: 83558
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel Raptor Lake-P [Iris Xe Graphics] vendor: Hewlett-Packard
    driver: i915 v: kernel arch: Gen-13 ports: active: eDP-1 empty: DP-1, DP-2,
    DP-3, DP-4, DP-5 bus-ID: 00:02.0 chip-ID: 8086:a7a0
  Device-2: Intel DG2 [Arc A370M] vendor: Hewlett-Packard driver: i915
    v: kernel arch: Gen-12.7 pcie: speed: 2.5 GT/s lanes: 1 bus-ID: 03:00.0
    chip-ID: 8086:5693
  Display: wayland server: Xwayland v: 23.2.6 compositor: kwin_wayland
    driver: N/A display-ID: 0
  Monitor-1: eDP-1 res: 1920x1200 size: N/A
  API: EGL v: 1.5 platforms: device: 0 drv: iris device: 1 drv: iris
    device: 2 drv: swrast gbm: drv: iris surfaceless: drv: iris wayland:
    drv: iris x11: drv: iris
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.0.8 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel Graphics (RPL-P)
    device-ID: 8086:a7a0 display-ID: :0.0
  API: Vulkan v: 1.3.280 surfaces: xcb,xlib,wayland device: 0
    type: integrated-gpu driver: N/A device-ID: 8086:a7a0 device: 1
    type: discrete-gpu driver: N/A device-ID: 8086:5693 device: 2 type: cpu
    driver: N/A device-ID: 10005:0000
Audio:
  Device-1: Intel vendor: Hewlett-Packard driver: N/A bus-ID: 00:05.0
    chip-ID: 8086:a75d
  Device-2: Intel Raptor Lake-P/U/H cAVS vendor: Hewlett-Packard
    driver: sof-audio-pci-intel-tgl bus-ID: 00:1f.3 chip-ID: 8086:51ca
  API: ALSA v: k6.8.5-301.fc40.x86_64 status: kernel-api
  Server-1: PipeWire v: 1.0.7 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin
Network:
  Device-1: Intel Raptor Lake PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3 chip-ID: 8086:51f1
  IF: wlo1 state: up mac: <filter>
Bluetooth:
  Device-1: Intel AX211 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 bus-ID: 3-10:3 chip-ID: 8087:0033
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.3
    lmp-v: 12
Drives:
  Local Storage: total: 5.5 TiB used: 762.73 GiB (13.5%)
  ID-1: /dev/nvme0n1 vendor: Fanxiang model: S880 4TB size: 3.64 TiB
    speed: 63.2 Gb/s lanes: 4 serial: <filter> temp: 40.9 C
  ID-2: /dev/nvme1n1 vendor: Micron model: MTFDKBA2T0TFH-1BC1AABHA
    size: 1.86 TiB speed: 63.2 Gb/s lanes: 4 serial: <filter> temp: 24.9 C
Partition:
  ID-1: / size: 1.82 TiB used: 42.59 GiB (2.3%) fs: btrfs dev: /dev/nvme0n1p3
  ID-2: /boot size: 973.4 MiB used: 331.5 MiB (34.1%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 19 MiB (3.2%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 1.82 TiB used: 42.59 GiB (2.3%) fs: btrfs
    dev: /dev/nvme0n1p3
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
  ID-2: swap-2 type: partition size: 31.26 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/nvme0n1p5
Sensors:
  Src: /sys System Temperatures: cpu: 48.0 C mobo: N/A
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB available: 31.04 GiB used: 5.16 GiB (16.6%)
    igpu: 60 MiB
  Processes: 435 Power: uptime: 37m wakeups: 0 Init: systemd v: 255
    target: graphical (5) default: graphical
  Packages: pm: flatpak pkgs: 12 Compilers: gcc: 14.1.1 Shell: Bash
    v: 5.2.26 running-in: konsole inxi: 3.3.34

A screenshot of the error with Kernel 6.8.10


Any idea about a solution?

We will need more information.

Please edit your post to make it easier to read by adding two lines, each with three backquotes, one before the inxi output and one after the `inxi output.

Laptops and external cases may not provide adequate cooling for large SSD’s. so it would be a good idea use a temperature tracking tool. The S.M.A.R.T data may show if a drive has overheated. You could try booting 6.8.10 when the machine has been off for several hours so the drives are cool.

What type of connection does the external nvme drive use and does the case have a
cooling fan?

Your 6.8.5 kernel has the root partition on nvme0:

ID-1: / size: 1.82 TiB used: 42.59 GiB (2.3%) fs: btrfs dev: /dev/nvme0n1p3

The btrfs errors are for nvme1n1. Since the system is booting from nvme0n1, you could try booting 6.8.10 without mounting partitions on nvme1n1 so you can run diagnostics on nvme1n1. Please post the contents of your /etc/fstab so we can see the mount options.

You should make sure F40 is fully updated and check for firmware updates for your drives and system “BIOS”. You can try running S.M.A.R.T drive tests on nvme1n1 from Gnome Disks or the command-line after installing smartmontools.

Thanks for your answer. Now I try to clarify how my disks are structured. My laptop has an internal 2 TB nvme ssd with win11 installed. On an usb4 port I attached a 4TB nvme ssd put in a fan equipped usb4/thunderbolt4 case that I divided in two parts, one dedicated to Fedora 40 and the second to an ntfs partition to be be used as a shared storage among windows and linux.
I noticed that the primary disk, at least in the name assignment, had to be the 2TB one, as it was in the past fedora versions and laptops, but I had no idea about how to change ths, so I didn’t care.
Now I start to list the data that you requested me.

About the disks internal organization, Gparted looks clearer than any command line utility, so I will use gparted for that purpose. Should you prefer a command line output, I will provide it.

/dev/nvme0n1

/dev/nvme0n1 smartctl command

smartctl --all /dev/nvme0n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.5-301.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Model Number:                       Fanxiang S880 4TB
Serial Number:                      FXS880232710956
Firmware Version:                   SN12237
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 4,000,787,030,016 [4.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       2.0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          4,000,787,030,016 [4.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            d0d0d0 d0d0d0d0d0
Local Time is:                      Thu May 30 02:03:05 2024 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0a):         Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.7460W       -        -    3  3  3  3     5000   10000
 4 -   0.7260W       -        -    4  4  4  4     8000   41000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    0%
Data Units Read:                    6,106,852 [3.12 TB]
Data Units Written:                 5,916,190 [3.02 TB]
Host Read Commands:                 39,780,459
Host Write Commands:                51,021,646
Controller Busy Time:               46
Power Cycles:                       352
Power On Hours:                     578
Unsafe Shutdowns:                   146
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               40 Celsius
Temperature Sensor 2:               26 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x2002)

/dev/nvme1n1

dev/nvme1n1 smartctl command

root@LAPTOP-3:/home/andrea# smartctl --all /dev/nvme1n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.5-301.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       MTFDKBA2T0TFH-1BC1AABHA
Serial Number:                      UMDME0176HV136
Firmware Version:                   HPS0043
PCI Vendor/Subsystem ID:            0x1344
IEEE OUI Identifier:                0x00a075
Total NVM Capacity:                 2,048,408,248,320 [2.04 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            00a075 013f7e7901
Local Time is:                      Thu May 30 03:02:34 2024 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x1b):         S/H_per_NS Cmd_Eff_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     82 Celsius
Namespace 1 Features (0x08):        No_ID_Reuse

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.25W       -        -    0  0  0  0        0       0
 1 +     4.00W       -        -    1  1  1  1        0       0
 2 +     2.00W       -        -    2  2  2  2        0       0
 3 -   0.1000W       -        -    3  3  3  3     5000    6000
 4 -   0.0050W       -        -    4  4  4  4    12000   35000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        28 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    14,580,243 [7.46 TB]
Data Units Written:                 7,997,307 [4.09 TB]
Host Read Commands:                 127,593,504
Host Write Commands:                84,901,481
Controller Busy Time:               35
Power Cycles:                       1,123
Power On Hours:                     177
Unsafe Shutdowns:                   125
Media and Data Integrity Errors:    0
Error Information Log Entries:      1,117
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               28 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

fstab contents

**# /etc/fstab**
**# Created by anaconda on Sun May 12 15:30:50 2024**
**#**
**# Accessible filesystems, by reference, are maintained under '/dev/disk/'.**
**# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.**
**#**
**# After editing this file, run 'systemctl daemon-reload' to update systemd**
**# units generated from this file.**
**#**
UUID=375d7679-d657-4c18-8363-c1584ae01d06 /                       btrfs   subvol=root,compress=zstd:1 0 0
UUID=a1095baa-868a-4c72-b59f-0b4b9c2a8483 /boot                   ext4    defaults        1 2
UUID=855C-33EE          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
UUID=375d7679-d657-4c18-8363-c1584ae01d06 /home                   btrfs   subvol=home,compress=zstd:1 0 0
UUID=a1e4cc38-77ef-4378-88b7-95bc6a75d24e none  swap    sw    0 0

I hope that now my disks configuration and status is more clear.

The drive config seems clear.
Now please post the output of cat /proc/cmdline and cat /etc/default/grub

My only other comment at present is that the swap partition is not normally required in fedora unless the user is intending to use hibernation.

I don’t use btrfs on my daily driver so this is just a guess.

How did you resize that partition?, and did you properly resize the btrfs file system on the volume before shrinking the partition.?
Some kernels are more forgiving than others with btrfs errors, but to me it seems that if there were an error in the procedure used to resize that btrfs file system it may trigger the errors you are seeing.

The two commands you asked me:

andrea@LAPTOP-3:~$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-6.8.5-301.fc40.x86_64 root=UUID=375d7679-d657-4c18-8363-c1584ae01d06 ro rootflags=subvol=root rhgb quiet


andrea@LAPTOP-3:~$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
GRUB_SAVEDEFAULT=TRUE
andrea@LAPTOP-3:~$ 

About the second question, I first removed all the partitions from the external disk using gparted, then I installed FD 40 using the automatic partitioning of anaconda. I did this because with fedora 39, all my tentatives to reduce the size of the btrfs partition at anaconda level, resulted in an inbootable disk. So, after having completed the installation, I started again gparted from the live installation disk and reduced first the size of the btrfs partition and then I used the freed space to create and format an ntfs partition.
This process did not create a swap partition. I created that because, in at least one of the posts I read, it was mentioned that the boot problem was solved creating the swap partition or a swap file, I don’t remember well.

OK, found the solution, digging deep in the web.
It is a kernel regression problem, introduced in Kernel 6.8.8. Thunderbolt devices, not only external disks, are not correctly initialized.
The workaround is to add the kernel parameter “thunderbolt.host_reset=false”, or directly editing the menu entry with the “e” key and then appending the above parameter to the linux line (the one normally ending with “quiet”) as a temporary workaround, or adding the parameter to /etc/default/grub/GRUB_CMDLINE_LINUX= , and then regenerating the grub configuration file, for a permanent workaround. As far as I could check, the regression is still present in kernel 6.9.1.

2 Likes

I registered specially to thank you!
I also have Fedora 40 on an external Thunderbolt4/USB4 NVMe, but could not boot anything from 6.8.11 up to 6.9.6, and was forced to boot from 6.8.5. Every kernel update I had my hopes up, but nothing.
I also searched the web a lot and tried different things, to no avail. Until I found your post!
“thunderbolt.host_reset=false” did the trick! Thanks a lot!