Fedora 42 unstable on my system

Ryzen 5600g, 32gig ram, and a samsung nvme. I do not have any issues when I boot into Windows. This seems to be a linux only thing.

At the time, free -h reported 14 gigs ram used and 8kb paged out. Saving a file in kate during these messages caused a hard lock. This has been happening quite often and has persisted over a few kernel updates.

Linux box 6.15.7-200.fc42.x86_64

Current:

swapon --show
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition   8G   0B  100

After crash
free -h
               total        used        free      shared  buff/cache   available
Mem:            27Gi       7.9Gi        10Gi       552Mi       9.0Gi        19Gi
Swap:          8.0Gi          0B       8.0Gi
Message from syslogd@box at Jul 30 15:18:07 ...
 kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kswapd0:121]

Message from syslogd@box at Jul 30 15:18:35 ...
 kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 52s! [kswapd0:121]

Message from syslogd@box at Jul 30 15:19:07 ...
 kernel:watchdog: BUG: soft lockup - CPU#5 stuck for 82s! [kswapd0:121]

System was otherwise responsive. Doubt it was due to memory pressure. Mostly idle, just web browsing, except for that one CPU core stuck in a loop.

cat /proc/sys/kernel/tainted
0

It seems that somthing on your system is using a lot of kernel time.

Have a read of What are all these "Bug: soft lockup" messages about? | Support | SUSE that provides some explaination about what the message implies.

That web page says that there should be kernel statck traces that may point to the problem area.

It seems that your issue may be swap related? I’m guessing that based onthe process name, kswapd0. Can you find a stack trace never these logs in the system journal?

Well I am using zram for swap. I guess this is the default setting on fedora 42?

I’m fairly Linux old-school (from v2 linux) and haven’t kept up with all this new shiny stuff.

I don’t see anything from journalctl.

journalctl -k -b | grep -i kswapd

Looks like nothing got saved. I’ve had hard locks shortly after boot in the recent past. My uptime this time around was a few days.

I’ll try updating again. There is a new kernel update. Oh well. I guess if it continues I’ll turn off zram. I wasn’t really stressing my system though. Just a youtube video at the time this happened.

Jul 30 15:18:07 box kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kswapd0:121]
Jul 30 15:18:07 box kernel: CPU#5 Utilization every 4s during lockup:
Jul 30 15:18:07 box kernel: #011#1: 100% system,#011  0% softirq,#011  1% hardirq,#011  0% idle
Jul 30 15:18:07 box kernel: #011#2: 100% system,#011  0% softirq,#011  0% hardirq,#011  0% idle
Jul 30 15:18:07 box kernel: #011#3: 100% system,#011  0% softirq,#011  1% hardirq,#011  0% idle
Jul 30 15:18:07 box kernel: #011#4: 100% system,#011  0% softirq,#011  0% hardirq,#011  0% idle
Jul 30 15:18:07 box kernel: #011#5: 100% system,#011  0% softirq,#011  1% hardirq,#011  0% idle
Jul 30 15:18:07 box kernel: Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr bnep sunrpc binfmt_misc vfat fat squashfs iwlmvm amd_atl intel_rapl_msr intel_rapl_common mac80211 snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_scodec_component snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi libarc4 snd_hda_codec kvm iwlwifi snd_hda_core btusb snd_hwdep btrtl snd_seq btintel snd_seq_device btbcm ee1004 snd_pcm btmtk raid1 irqbypass cfg80211 rapl snd_timer r8169 wmi_bmof bluetooth snd i2c_piix4 pcspkr realtek soundcore k10temp i2c_smbus rfkill joydev gpio_amdpt apple_mfi_fastcharge gpio_generic loop nfnetlink zram lz4hc_compress lz4_compress amdgpu hid_logitech_hidpp amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper
Jul 30 15:18:07 box kernel: drm_panel_backlight_quirks drm_buddy nvme polyval_clmulni polyval_generic ghash_clmulni_intel drm_display_helper nvme_core sha512_ssse3 video sha256_ssse3 cec sha1_ssse3 nvme_keyring nvme_auth sp5100_tco hid_apple wmi hid_logitech_dj fuse i2c_dev
Jul 30 15:18:07 box kernel: CPU: 5 UID: 0 PID: 121 Comm: kswapd0 Tainted: G      D             6.15.7-200.fc42.x86_64 #1 PREEMPT(lazy)
Jul 30 15:18:07 box kernel: Tainted: [D]=DIE
Jul 30 15:18:07 box kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C95/B550M PRO-VDH WIFI (MS-7C95), BIOS 2.L1 05/06/2024
Jul 30 15:18:07 box kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x67/0x310
Jul 30 15:18:07 box kernel: Code: 0f ba 29 08 0f 92 c2 8b 01 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 4d 85 c0 74 10 0f b6 01 84 c0 74 09 f3 90 0f b6 01 <84> c0 75 f7 b8 01 00 00 00 66 89 01 e9 43 95 be fe 8b 37 b8 00 02
Jul 30 15:18:07 box kernel: RSP: 0018:ffffd04940517830 EFLAGS: 00000202
Jul 30 15:18:07 box kernel: RAX: 0000000000000001 RBX: fffff2374d8905c0 RCX: ffff8c16b3509630
Jul 30 15:18:07 box kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c16b3509630
Jul 30 15:18:07 box kernel: RBP: ffff8c16b3509720 R08: 0000000000000000 R09: ffffffffadc61fae
Jul 30 15:18:07 box kernel: R10: ffff8c14ef947168 R11: fffff23744be51c0 R12: 0000000000000001
Jul 30 15:18:07 box kernel: R13: fffff2374d8905c8 R14: ffffd049405178d8 R15: 0000000000000000
Jul 30 15:18:07 box kernel: FS:  0000000000000000(0000) GS:ffff8c1b114f0000(0000) knlGS:0000000000000000
Jul 30 15:18:07 box kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 30 15:18:07 box kernel: CR2: 00002f1002fd5000 CR3: 0000000447a2e000 CR4: 0000000000f50ef0
Jul 30 15:18:07 box kernel: PKRU: 55555554
Jul 30 15:18:07 box kernel: Call Trace:
Jul 30 15:18:07 box kernel: <TASK>
Jul 30 15:18:07 box kernel: _raw_spin_lock+0x29/0x30
Jul 30 15:18:07 box kernel: __remove_mapping+0x55/0x220
Jul 30 15:18:07 box kernel: shrink_folio_list+0xa51/0xd40
Jul 30 15:18:07 box kernel: evict_folios+0x37c/0xb70
Jul 30 15:18:07 box kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jul 30 15:18:07 box kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jul 30 15:18:07 box kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jul 30 15:18:07 box kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jul 30 15:18:07 box kernel: try_to_shrink_lruvec+0x1aa/0x340
Jul 30 15:18:07 box kernel: ? lru_gen_add_folio+0x33f/0x4c0
Jul 30 15:18:07 box kernel: shrink_one+0x108/0x1f0
Jul 30 15:18:07 box kernel: shrink_many+0x152/0x2d0
Jul 30 15:18:07 box kernel: shrink_node+0x45b/0x4f0
Jul 30 15:18:07 box kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jul 30 15:18:07 box kernel: ? lru_gen_age_node+0x64/0x220
Jul 30 15:18:07 box kernel: balance_pgdat+0x30d/0x890
Jul 30 15:18:07 box kernel: kswapd+0xda/0x170
Jul 30 15:18:07 box kernel: ? __pfx_kswapd+0x10/0x10
Jul 30 15:18:07 box kernel: kthread+0xfc/0x240
Jul 30 15:18:07 box kernel: ? __pfx_kthread+0x10/0x10
Jul 30 15:18:07 box kernel: ret_from_fork+0x34/0x50
Jul 30 15:18:07 box kernel: ? __pfx_kthread+0x10/0x10
Jul 30 15:18:07 box kernel: ret_from_fork_asm+0x1a/0x30
Jul 30 15:18:07 box kernel: </TASK>
Jul 30 15:18:08 box abrt-dump-journal-oops[1285]: abrt-dump-journal-oops: Found oopses: 1
Jul 30 15:18:08 box abrt-dump-journal-oops[1285]: abrt-dump-journal-oops: Creating problem directories
Jul 30 15:18:09 box abrt-dump-journal-oops[1285]: Reported 1 kernel oopses to Abrt
Jul 30 15:18:09 box abrt-server[408047]: Can't find a meaningful backtrace for hashing in '.'
Jul 30 15:18:09 box abrt-server[408047]: Deleting non-reportable oops '.' because DropNotReportableOopses is set to 'yes'
Jul 30 15:18:09 box abrt-server[408047]: 'post-create' on '/var/spool/abrt/oops-2025-07-30-15:18:08-1285-0' exited with 1
Jul 30 15:18:09 box abrt-server[408047]: Deleting problem directory '/var/spool/abrt/oops-2025-07-30-15:18:08-1285-0'
Jul 30 15:18:09 box abrt-server[408047]: Lock file '.lock' was locked by process 408057, but it crashed?

After that, saving a file in kate caused what appeared to be a hard lock. Couldn’t switch to console tty. No respone from keyboard or mouse. No screen updates.

In the past, I’ve waited for the system to become responsive again (for hours) and it never recovered, so I just rebooted.

Given that:

  1. I was only using 14gb out of the 27gb of ram available.
  2. free -h reported 8kb of swap use when it was happening
  3. I was only playing a youtube video. No other apps were being used.

Either a hardware issue (unlikely since my system appears fine on Windows and previous versions of Fedora), or there’s buggy drivers, buggy kernel code, or fedora is imposing unstable linux facilities.

I have recently run memory tests and hd tests. Nothing shows up there.

It would be nice to get a list of these unstable facilities so that users can easily toggle them off when they become a problem.

iirc I only updated because my Fedora version was out of support. It would be great to have a stable system again.

Having lived through the bad capacitor error in a job where every crash would loose irreplaceable data, I do appreciate stability.

When reporting an issue:

  • it is often helpful to include hardware details (in a format that can be found using web searches) so others with similar hardware can attempt to reproduce the issue. The output from inxi -Fzxx is a good start.
  • make sure Fedora packages and vendor firmware are fully updated so you aren’t chasing a solved problem
  • if the vendor provides diagnostic tools you should use them – they sometimes identify known hardware failure modes specific to your system

Have you tried booting older kernel versions? You could try using journalctl --follow whle saving a file. You should also run sudo smartctl -x <nvme_dev_name> to rule out an issue with the NVME device.

I see two things on the stacktrace folio and kswapd.
Folio is new memory management code.
I wonder if you are triggering a bug in the new code?
Can you report as a linux kernel bug?

Thanks for the tips!

Yeah, could be triggering a bug. Will see about reporting it proper.

It was probably BTRFS.

System no longer boots as BTRFS has failed.

Fedora does not make it easy to attach these crash logs with stack traces, so I am assuming they don’t care much about them.

Sep 26 02:35:29 box BUG (pooled)  pfn:362629
Sep 26 02:35:29 box kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x362629
Sep 26 02:35:29 box kernel: memcg:1000000000
Sep 26 02:35:29 box kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
Sep 26 02:35:29 box kernel: raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
Sep 26 02:35:29 box kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000001000000000
Sep 26 02:35:29 box kernel: page dumped because: page still charged to cgroup

Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: Oops: invalid opcode: 0000 [#1] SMP NOPTI
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: CPU: 4 UID: 0 PID: 929 Comm: kworker/u48:12 Tainted: G    B               6.16.7-200.fc42.x86_64 #1 PREEMPT(lazy) 
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: Tainted: [B]=BAD_PAGE
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C95/B550M PRO-VDH WIFI (MS-7C95), BIOS 2.L1 05/06/2024
Sep 25 20:14:52 box kernel: ------------[ cut here ]------------
Sep 25 20:14:52 box kernel: kernel BUG at mm/rmap.c:1123!
Sep 25 20:14:52 box kernel: Workqueue: btrfs-delalloc btrfs_work_helper
Sep 25 20:14:52 box kernel: RIP: 0010:folio_mkclean+0xb9/0xc0
Sep 25 20:14:52 box kernel: Code: 50 83 c0 01 85 c0 7e dc 48 89 3c 24 e8 a0 00 
Sep 25 20:11:51 box kernel: BUG: Bad page state in process spotify:gl0  pfn:36a629
Sep 25 20:11:51 box kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36a629
Sep 25 20:11:51 box kernel: memcg:1000000000
Sep 25 20:11:51 box kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
Sep 25 20:11:51 box kernel: raw: 0017ffffc0000000 fffffa748da98a48 fffffa748da98a48 0000000000000000
Sep 25 20:11:51 box kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000001000000000
Sep 25 20:11:51 box kernel: page dumped because: page still charged to cgroup

Sep 24 12:28:09 box kernel: non-paged memory
Sep 24 12:28:09 box kernel: list_del corruption. next->prev should be fffff9318d88aa08, but was fffff9218d88aa08. (next=fffff9318d88aa48)
Sep 24 12:28:09 box kernel: ------------[ cut here ]------------
Sep 24 12:28:09 box kernel: kernel BUG at lib/list_debug.c:65!
Sep 24 12:28:09 box kernel: Oops: invalid opcode: 0000 [#1] SMP NOPTI
Sep 24 12:28:09 box kernel: CPU: 3 UID: 1000 PID: 3940 Comm: BgIOThr~Pool #2 Not tainted 6.16.7-200.fc42.x86_64 #1 PREEMPT(lazy) 
Sep 24 12:28:09 box kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C95/B550M PRO-VDH WIFI (MS-7C95), BIOS 2.L1 05/06/2024
Sep 24 12:28:09 box kernel: RIP: 0010:__list_del_entry_valid_or_report+0x10a/0x110
Sep 24 12:28:09 box kernel: Code: 89 d7 48 89 14 24 e8 45 be a8 ff 48 8b 14 24 48 8b 74 24 08 48 c7 c7 f8 83 aa 8a 48 8b 42 08 48 89 d1 48 89 c2 e8 06 f6 5d ff <0f> 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90

Sep 23 01:28:47 box kernel: BUG: Bad page state in process firefox  pfn:36252a
Sep 23 01:28:47 box kernel: page: refcount:16 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36252a
Sep 23 01:28:47 box kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
Sep 23 01:28:47 box kernel: raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
Sep 23 01:28:47 box kernel: raw: 0000000000000000 0000000000000000 00000010ffffffff 0000000000000000
Sep 23 01:28:47 box kernel: page dumped because: nonzero _refcount

Sep 23 01:18:19 box kernel: BUG: Bad page state in process WebExtensions  pfn:36a52a
Sep 23 01:18:19 box kernel: page: refcount:16 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36a52a
Sep 23 01:18:19 box kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
Sep 23 01:18:19 box kernel: raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
Sep 23 01:18:19 box kernel: raw: 0000000000000000 0000000000000000 00000010ffffffff 0000000000000000
Sep 23 01:18:19 box kernel: page dumped because: nonzero _refcount

Sep 23 01:19:25 box kernel: non-paged memory
Sep 23 01:19:25 box kernel: list_del corruption. next->prev should be fffffcf30d88aa08, but was fffffce30d88aa08. (next=fffffcf30d88aa48)
Sep 23 01:19:25 box kernel: ------------[ cut here ]------------
Sep 23 01:19:25 box kernel: kernel BUG at lib/list_debug.c:65!
Sep 23 01:19:25 box kernel: Oops: invalid opcode: 0000 [#1] SMP NOPTI
Sep 23 01:19:25 box kernel: CPU: 10 UID: 1000 PID: 4352 Comm: Chrome_ChildIOT Not tainted 6.16.7-200.fc42.x86_64 #1 PREEMPT(lazy) 
Sep 23 01:19:25 box kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C95/B550M PRO-VDH WIFI (MS-7C95), BIOS 2.L1 05/06/2024
Sep 23 01:19:25 box kernel: RIP: 0010:__list_del_entry_valid_or_report+0x10a/0x110
Sep 23 01:19:25 box kernel: Code: 89 d7 48 89 14 24 e8 45 be a8 ff 48 8b 14 24 48 8b 74 24 08 48 c7 c7 f8 83 ea 94 48 8b 42 08 48 89 d1 48 89 c2 e8 06 f6 5d ff <0f> 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90

Sep 24 12:23:11 box kernel: non-paged memory
Sep 24 12:23:11 box kernel: list_del corruption. next->prev should be fffff23556e8aa08, but was fffff22556e8aa08. (next=fffff23556e8aa48)
Sep 24 12:23:11 box kernel: ------------[ cut here ]------------
Sep 24 12:23:11 box kernel: kernel BUG at lib/list_debug.c:65!
Sep 24 12:23:11 box kernel: Oops: invalid opcode: 0000 [#1] SMP NOPTI
Sep 24 12:23:11 box kernel: CPU: 2 UID: 0 PID: 6237 Comm: cc1plus Not tainted 6.16.7-200.fc42.x86_64 #1 PREEMPT(lazy) 
Sep 24 12:23:11 box kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C95/B550M PRO-VDH WIFI (MS-7C95), BIOS 2.L1 05/06/2024
Sep 24 12:23:11 box kernel: RIP: 0010:__list_del_entry_valid_or_report+0x10a/0x110
Sep 24 12:23:11 box kernel: Code: 89 d7 48 89 14 24 e8 45 be a8 ff 48 8b 14 24 48 8b 74 24 08 48 c7 c7 f8 83 2a af 48 8b 42 08 48 89 d1 48 89 c2 e8 06 f6 5d ff <0f> 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90

Ran a burnintest test by passmark.

Ran prime95

Ran memtest.

All done on Windows. Nothing that points to a hardware problem.

If I so much as sneezed at Linux, it would crash.

It’s been doing that for months. No kernel updates helped.

Have you considered a vendor firmware problem? Most vendors support firmware updates using Windows, but those interested in supporting linux may provides updates through Linux Firmware Update Service. Recent kernels often require firmware updates. Some systems need kernel command-line options.

You can try to catch the attention of linux users with similar hardware who may have a solution by posting the output from running inxi -Fzxx in a terminal and posting as web-searchable pre-fromatted text. You can also see if your system has probles in the LHDB. These sometimes have user comments addressing system-dependent issues.

Sure could be, but my hardware is pretty standard.

AMD ryzen 5600g. No dedicated GPU.

Just a basic motherboard.

The drive is NVME, so could be issue there. But my system was stable before upgrading to fedora 42 and the linux 6.16 kernel.

So could still be a firmware/driver issue… but I’m willing to bet it was BTRFS as the root cause all along.

Thanks for the link though. I’ll save that!

I’ll probably try once more with ext4. I really don’t want to stay on Windows, and I really don’t want to be distro hopping.

Have you been doing BTRFS maintenance?

No.

I haven’t messed with the filesystem. I forgot I was even using it.

I thought it was something else - like graphics drivers, which seems fairly common.

Guess we’ll see soon enough if switching to EXT4 helps.

Interesting.

The fedora 42 live iso fails to boot on my usb stick. Just gives me a booting countdown and some graphical gibberish.

Tried it with fedora media writer as well as the balenaetcher utility.

and can make a btfs filesystem go read-only: https://github.com/maharmstone/btrfs/issues/739.

Have you tried booting the F43 beta installer?