Fedora sees only 1 CPU core after updating the kernel from 6.8.x to 6.9.x

Hi,

a few days ago I updated the kernel to 6.9.x and since then Fedora only sees 1 CPU core. I tried the 6.10.x kernel but it’s the same. So I went back to kernel 6.8.x which works fine.

What could be the problem and how do I know when I can safely update to the new/current kernel?


I already tried adding “acpi=off” to grub which didn’t work and this is the output of sudo dmesg |grep -i cpu:

[    0.008441] ACPI: SSDT 0x0000000093B73000 0025E8 (v02 CpuRef CpuSsdt  00003000 INTL 20180209)
[    0.042397] CPU topo: Limiting to 1 possible CPUs
[    0.042402] CPU topo: CPU limit of 1 reached. Ignoring further CPUs
[    0.042433] CPU topo: Max. logical packages:   1
[    0.042433] CPU topo: Max. logical dies:       1
[    0.042434] CPU topo: Max. dies per package:   1
[    0.042436] CPU topo: Max. threads per core:   1
[    0.042437] CPU topo: Num. cores per package:     1
[    0.042437] CPU topo: Num. threads per package:   1
[    0.042437] CPU topo: Allowing 1 present CPUs plus 0 hotplug CPUs
[    0.042438] CPU topo: Rejected CPUs 15
[    0.046632] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
[    0.046759] percpu: Embedded 87 pages/cpu s233472 r8192 d114688 u2097152
[    0.046763] pcpu-alloc: s233472 r8192 d114688 u2097152 alloc=1*2097152
[    0.046765] pcpu-alloc: [0] 0 
[    0.098583] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.105583] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=1.
[    0.105586] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.109003] CPU0: Thermal monitoring enabled (TM1)
[    0.109005] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    0.109084] MMIO Stale Data: Mitigation: Clear CPU buffers
[    0.109979] smpboot: CPU0: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz (family: 0x6, model: 0xa7, stepping: 0x1)
[    0.109979] smp: Bringing up secondary CPUs ...
[    0.109979] smp: Brought up 1 node, 1 CPU
[    0.112542] cpuidle: using governor menu
[    0.135041] cryptd: max_cpu_qlen set to 1000
[    0.181648] ACPI: SSDT 0xFFFF9C6700B96C00 000394 (v02 PmRef  Cpu0Cst  00003001 INTL 20180209)
[    0.182111] ACPI: SSDT 0xFFFF9C67008C3000 000560 (v02 PmRef  Cpu0Ist  00003000 INTL 20180209)
[    0.182575] ACPI: SSDT 0xFFFF9C6700D28800 0001CB (v02 PmRef  Cpu0Psd  00003000 INTL 20180209)
[    0.183003] ACPI: SSDT 0xFFFF9C6700B97000 0002F4 (v02 PmRef  Cpu0Hwp  00003000 INTL 20180209)
[    0.184228] ACPI: _OSC evaluated successfully for all CPUs

Thanks a lot for your help. :slight_smile:

I have never seen that before! I found the code that prints the message and where it rejects the CPUs here linux/arch/x86/kernel/cpu/topology.c at master · torvalds/linux · GitHub From that code I found 4 reasons to reject a CPU and 3 messages you can look for in dmesg.

pr_info_once("Ignoring hot-pluggable APIC ID %x in present package.\n",
             apic_id);
topo_info.nr_rejected_cpus++;

pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
topo_info.nr_rejected_cpus++;

pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
topo_info.nr_rejected_cpus++;

pr_warn("Enumerated BSP APIC %x is not marked in APICBASE MSR\n", apic_id);
pr_warn("Assuming crash kernel. Limiting to one CPU to prevent machine INIT\n");
set_nr_cpu_ids(1);
goto fwbug;

pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x != %x\n",
        topo_info.boot_cpu_apic_id, apic_id);
pr_warn("Crash kernel detected. Disabling real BSP to prevent machine INIT\n");
pr_warn(FW_BUG "APIC enumeration order not specification compliant\n");

In the terminal run sudo dmesg --level=err,warn. do you see any of the above messages?

It would be worth seeing if there is a firmware update for the systems BIOS.
The BIOS is responsible for settings up the CPU and a mistake in the BIOS could be involved.

2 Likes

Thanks for the reply. I ran the command but can’t really recognize anything that helps…

This is probably important, but I don’t know what to do with that:

CPU topo: CPU limit of 1 reached. Ignoring further CPUs

[    0.000000] Malformed early option 'acpi'
[    0.042322] CPU topo: CPU limit of 1 reached. Ignoring further CPUs
[    0.108881] ACPI: setting ELCR to 0200 (from 0000)
[    0.379285] ACPI: \_SB_.LNKA: BIOS reported IRQ 0, using IRQ 11
[    0.584319] ACPI: \_SB_.LNKB: BIOS reported IRQ 1, using IRQ 10
[    0.586334] hpet_acpi_add: no address or irqs in _CRS
[    0.598796] intel-lpss 0000:00:15.0: can't derive routing for PCI INT A
[    0.598797] intel-lpss 0000:00:15.0: PCI INT A: not connected
[    0.598823] intel-lpss 0000:00:15.0: probe with driver intel-lpss failed with error -2147483648
[    0.611091] intel-lpss 0000:00:15.1: can't derive routing for PCI INT B
[    0.611092] intel-lpss 0000:00:15.1: PCI INT B: not connected
[    0.611114] intel-lpss 0000:00:15.1: probe with driver intel-lpss failed with error -2147483648
[    0.623407] intel-lpss 0000:00:15.3: can't derive routing for PCI INT D
[    0.623408] intel-lpss 0000:00:15.3: PCI INT D: not connected
[    0.623433] intel-lpss 0000:00:15.3: probe with driver intel-lpss failed with error -2147483648
[    0.649531] usb: port power management may be unreliable
[    0.650721] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    0.654894] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    2.997347] sd 8:0:0:0: [sdd] No Caching mode page found
[    2.997349] sd 8:0:0:0: [sdd] Assuming drive cache: write through
[    3.010643] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.010647] GPT:3715423 != 31494143
[    3.010649] GPT:Alternate GPT header not at the end of the disk.
[    3.010650] GPT:3715423 != 31494143
[    3.010651] GPT: Use GNU Parted to correct GPT errors.
[    3.189571] sd 6:0:0:0: [sdb] No Caching mode page found
[    3.189574] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[    3.200233] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.200236] GPT:3691383 != 122138623
[    3.200238] GPT:Alternate GPT header not at the end of the disk.
[    3.200240] GPT:3691383 != 122138623
[    3.200241] GPT: Use GNU Parted to correct GPT errors.
[    4.316246] r8169 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control
[    4.478237] i801_smbus 0000:00:1f.4: Transaction timeout
[    4.685973] i801_smbus 0000:00:1f.4: Transaction timeout
[    5.647392] block nvme0n1: No UUID available providing old NGUID
[    6.801433] thermal thermal_zone2: failed to read out thermal zone (-61)
[    8.185216] nvidia: loading out-of-tree module taints kernel.
[    8.185222] nvidia: module license 'NVIDIA' taints kernel.
[    8.185222] Disabling lock debugging due to kernel taint
[    8.185225] nvidia: module license taints kernel.
[    8.425393] snd_hda_intel 0000:01:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x000f0000

[    8.509434] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.90.07  Fri May 31 09:35:42 UTC 2024
[    8.602195] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[   11.559206] Bluetooth: hci0: Malformed MSFT vendor event: 0x02
[   11.570078] Bluetooth: hci0: HCI LE Coded PHY feature bit is set, but its usage is not supported.
[   57.076396] warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211

There was a BIOS update available but it didn’t solve the problem.

Some kernel command option is not supported I guess.

What is you kernel command line?

Do you mean this?

GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash noapic nvidia-drm.modeset=1 rd.driver.blacklist=nouveau”

Thanks I was thinking of cat /proc/cmdline

That acpi error seems odd.
Does it also happen when you boot the working kernel?

It didn’t happen when booting the working kernel.

Thanks a lot for your help, I really need it working so I guess I’m gonna install another distribution for now.

Thanks again. This thread can be closed…

The problem is in the linux kernel not fedora as such.
Any distribution that uses the newer kernels I expect to break for you.

If you find a distro that does use the new kernel and the CPU bug does not show up then please let us know.

2 Likes

I think it’s worth reporting this bug against package kernel.

That’s true. I haven’t done that before. Where and how do I report this bug?

See How to file a bug :: Fedora Docs
You will need to have a fedora bugzilla account to do this that is also in these docs.

Bugzilla allow logging in with the same FAS credentials used here if the user chooses.

1 Like

I think it’s worth reporting this bug against package kernel.

They don’t care. I’ve been reporting kernel bugs multiple times for amdgpu /radeon. Nobody cares. Complete waste of time.

Added f40, kernel

This bug report seems related

https://bugzilla.redhat.com/show_bug.cgi?id=2295026

Hello! I am working on fixing the issue.

Yes, for some reason the “noapic” argument is causing the problem. I do not understand why at all that is being added as boot option.

I provide a comment in 2295026 – 6.8 kernel see all cpus, 6.9 only sees 1 cpu explaining a possible workaround, it would be worth trying it out.

I am now investigating where is the “noapic” coming from.

I have sent a patch to kernel as there was a change in behavior from 6.8 to 6.9. “noapic” option shouldn’t limit the number of CPUs on x86 64-bit.

Let’s see what the maintainers think about this. LKML: Fernando Fernandez Mancera: [PATCH] x86/cpu/topology: remove limit of CPUs due to noapic on x86_64

1 Like

After trying some other stuff I’m back to Fedora.
The problem still persists (even with the latest kernel) but I found a way to make it work.

  • noapic: the device boots but uses only 1 CPU
  • no boot parameter: the device freezes and doesn’t boot at all
  • pci=nobar: the device boots and everything seems to work

After some more research I found the parameter pci=nobar and it seems to work fine. The system boots and I can use all cores and the NVIDIA graphics card. Everything seems to work. :slight_smile:

Are there any disadvantaged in using this boot parameter? Maybe some bad stuff I haven’t noticed yet?

Thanks for all the helpful replies. I hope this piece of information helps other people with a similar problem.

So “nobar” mean “No Base Address Register”. I was not aware of what that was. After looking around on the internet, that is the best explanation I found.