Buffer overflows on kernel 6.5.x

I’m a long time user of Fedora Workstation on my Dell Latitude 7400. I’ve updated some weeks ago to the 6.5.x kernel, but since then, I cannot boot on any 6.5.x version. No problem when booting back to the 6.4.15 one (on which I’m writing this message). I have done some hardware system diagnostic and everything is right. Upgrading to Fedora 39 doesn’t solve the problem.

Two error messages in the logs BUG: Bad rss-counter state mm:00000000a3d60a4f type:MM_ANONPAGES val:1 and kernel BUG at lib/string_helpers.c:1031!.

Hereunder, the complete error log part obtained with journalctl -b-1:

nov. 11 18:42:53 plp08804.local kernel: kernel BUG at lib/string_helpers.c:1031!
nov. 11 18:42:53 plp08804.local kernel: invalid opcode: 0000 [#8] PREEMPT SMP NOPTI
nov. 11 18:42:53 plp08804.local kernel: CPU: 4 PID: 4892 Comm: sh Tainted: G      D    OE      6.5.11-300.fc39.x86_64 #1
nov. 11 18:42:53 plp08804.local kernel: Hardware name: Dell Inc. Latitude 7400/07WDVW, BIOS 1.26.0 07/05/2023
nov. 11 18:42:53 plp08804.local kernel: RIP: 0010:fortify_panic+0x13/0x20
nov. 11 18:42:53 plp08804.local kernel: Code: 41 5d c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 90 0d 95 9f e8 6d 32 9a ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 9>
nov. 11 18:42:53 plp08804.local kernel: RSP: 0018:ffffb084856c7e00 EFLAGS: 00010246
nov. 11 18:42:53 plp08804.local kernel: RAX: 000000000000002c RBX: ffff91281d97a800 RCX: 0000000000000000
nov. 11 18:42:53 plp08804.local kernel: RDX: 0000000000000000 RSI: ffff912edc521540 RDI: ffff912edc521540
nov. 11 18:42:53 plp08804.local kernel: RBP: ffff91281d97a8a0 R08: 0000000000000000 R09: ffffb084856c7ca8
nov. 11 18:42:53 plp08804.local kernel: R10: 0000000000000003 R11: ffffffffa0345d68 R12: 0000000000000000
nov. 11 18:42:53 plp08804.local kernel: R13: ffff91281d97a8a0 R14: 0000000000000006 R15: ffffffffc2103960
nov. 11 18:42:53 plp08804.local kernel: FS:  00007f271d831740(0000) GS:ffff912edc500000(0000) knlGS:0000000000000000
nov. 11 18:42:53 plp08804.local kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov. 11 18:42:53 plp08804.local kernel: CR2: 00007f271da0cb20 CR3: 0000000117648005 CR4: 00000000003706e0
nov. 11 18:42:53 plp08804.local kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
nov. 11 18:42:53 plp08804.local kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
nov. 11 18:42:53 plp08804.local kernel: Call Trace:
nov. 11 18:42:53 plp08804.local kernel:  <TASK>
nov. 11 18:42:53 plp08804.local kernel:  ? die+0x36/0x90
nov. 11 18:42:53 plp08804.local kernel:  ? do_trap+0xda/0x100
nov. 11 18:42:53 plp08804.local kernel:  ? fortify_panic+0x13/0x20
nov. 11 18:42:53 plp08804.local kernel:  ? do_error_trap+0x6a/0x90
nov. 11 18:42:53 plp08804.local kernel:  ? fortify_panic+0x13/0x20
nov. 11 18:42:53 plp08804.local kernel:  ? exc_invalid_op+0x50/0x70
nov. 11 18:42:53 plp08804.local kernel:  ? fortify_panic+0x13/0x20
nov. 11 18:42:53 plp08804.local kernel:  ? asm_exc_invalid_op+0x1a/0x20
nov. 11 18:42:53 plp08804.local kernel:  ? fortify_panic+0x13/0x20
nov. 11 18:42:53 plp08804.local kernel:  mfe_aac_fa_process_load_binary+0xf7/0x110 [mfe_aac_1007141249]
nov. 11 18:42:53 plp08804.local kernel:  bprm_execve+0x284/0x650
nov. 11 18:42:53 plp08804.local kernel:  do_execveat_common.isra.0+0x1ad/0x250
nov. 11 18:42:53 plp08804.local kernel:  __x64_sys_execve+0x36/0x40
nov. 11 18:42:53 plp08804.local kernel:  do_syscall_64+0x5d/0x90
nov. 11 18:42:53 plp08804.local kernel:  ? exc_page_fault+0x7f/0x180
nov. 11 18:42:53 plp08804.local kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
nov. 11 18:42:53 plp08804.local kernel: RIP: 0033:0x7f271d9124eb
nov. 11 18:42:53 plp08804.local kernel: Code: 0f 1e fa 48 8b 05 ed 9a 0f 00 48 8b 10 e9 0d 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 99 0f 00 f7 d8 64 8>
nov. 11 18:42:53 plp08804.local kernel: RSP: 002b:00007fffee371148 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
nov. 11 18:42:53 plp08804.local kernel: RAX: ffffffffffffffda RBX: 0000559910f124f0 RCX: 00007f271d9124eb
nov. 11 18:42:53 plp08804.local kernel: RDX: 0000559910f0dec0 RSI: 0000559910f125f0 RDI: 0000559910f124f0
nov. 11 18:42:53 plp08804.local kernel: RBP: 00007fffee371240 R08: 0000000000000001 R09: 0000000000000001
nov. 11 18:42:53 plp08804.local kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 00000000ffffffff
nov. 11 18:42:53 plp08804.local kernel: R13: 0000559910f124f0 R14: 0000559910f125f0 R15: 0000559910f0dec0
nov. 11 18:42:53 plp08804.local kernel:  </TASK>
nov. 11 18:42:53 plp08804.local kernel: Modules linked in: mfe_aac_1007141249(OE) rfcomm xt_multiport xt_mark snd_seq_dummy snd_hrtimer nf_conntrack_netlink xt_CHECKSUM xt_MASQUERADE xt_conntrack xt_addrtype ipt_REJECT br_netfilter >
nov. 11 18:42:53 plp08804.local kernel:  intel_pmc_bxt iTCO_vendor_support coretemp snd_soc_skl dell_laptop snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc intel_rapl_msr dell_smm_hwmon snd_soc_sst_dsp mac80211 snd_soc_acpi_intel_>
nov. 11 18:42:53 plp08804.local kernel:  processor_thermal_device industrialio_triggered_buffer kfifo_buf mei processor_thermal_rfim industrialio idma64 processor_thermal_mbox rfkill processor_thermal_rapl intel_rapl_common intel_so>
nov. 11 18:42:53 plp08804.local kernel: ---[ end trace 0000000000000000 ]---
nov. 11 18:42:53 plp08804.local kernel: RIP: 0010:fortify_panic+0x13/0x20
nov. 11 18:42:53 plp08804.local kernel: Code: 41 5d c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 90 0d 95 9f e8 6d 32 9a ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 9>
nov. 11 18:42:53 plp08804.local kernel: RSP: 0018:ffffb08484947e10 EFLAGS: 00010246
nov. 11 18:42:53 plp08804.local kernel: RAX: 000000000000002c RBX: ffff91281d97ca00 RCX: 0000000000000000
nov. 11 18:42:53 plp08804.local kernel: RDX: 0000000000000000 RSI: ffff912edc561540 RDI: ffff912edc561540
nov. 11 18:42:53 plp08804.local kernel: RBP: ffff91281d97caa0 R08: 0000000000000000 R09: ffffb08484947cb8
nov. 11 18:42:53 plp08804.local kernel: R10: 0000000000000003 R11: ffffffffa0345d68 R12: 0000000000000000
nov. 11 18:42:53 plp08804.local kernel: R13: ffff91281d97caa0 R14: 0000000000000006 R15: ffffffffc2103960
nov. 11 18:42:53 plp08804.local kernel: FS:  00007f271d831740(0000) GS:ffff912edc500000(0000) knlGS:0000000000000000
nov. 11 18:42:53 plp08804.local kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov. 11 18:42:53 plp08804.local kernel: CR2: 00007f271da0cb20 CR3: 0000000117648005 CR4: 00000000003706e0
nov. 11 18:42:53 plp08804.local kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
nov. 11 18:42:53 plp08804.local kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
nov. 11 18:42:53 plp08804.local kernel: BUG: Bad rss-counter state mm:00000000a3d60a4f type:MM_ANONPAGES val:1

Please raise a bug with the above information and a description of your hardware.
The bug tracker is here: Red Hat Bugzilla Main Page

Thanks for the link. Done: 2249269 – Buffer overflows on kernel 6.5.x.

1 Like

The kernel is tainted, so the prime suspect would be the kernel module that tainted it.

1
nov. 11 18:42:09 kernel: mfe_aac_1007141249: loading out-of-tree module taints kernel.
nov. 11 18:42:09 kernel: mfe_aac_1007141249: module verification failed: signature and/or required key missing - tainting kernel

OK, it seems that the problem comes from the Trelix antivirus kernel module (MFE is for McAfee). This antivirus is mandatory for our company. I will see if I can get an updated version of the package and will let you know if it fixes the issue.

Problem temporary solved through the blacklisting of the mfe_aac_1007141249 module. Thanks for the insight!

There’s still a big snag with the 6.5.x kernel that I don’t have with the 6.4.x. Now, it is booting, but, afterward, most of the programs that I launch fail with a SIGSEGV error… I haven’t any idea of why it can work with a kernel version and doesn’t with a newer one. strace gives no help (no trace, just an immediate fail).

Just made a complete memtest to be sure that RAM has no defect, no hardware issue detected.

If you boot with the previous kernel do the problems go away?

Yes. Under kernel version 6.4.x, no SIGSEGV when launching programs.

I have updated my antivirus program. Everything is fine now. Not sure to have understood the link between this and the SIGSEGV, but it seems fixed. Thanks for your help.