The following problem has occurred entirely random, without me having done any updates or something and I didn’t manage to fix it yet.
I’m using Fedora 40 in a dual boot setup with Windows 11, managed by GRUB. Recently, when I’m trying to boot into Fedora, my machine would crash and hard reboot immediately after I choose the “Fedora” boot entry in GRUB. Windows (as well as a live disk) work just fine.
I booted a live system, mounted the Fedora file system and chrooted into into it, then ran dnf update to make sure all packages are at their latest versions. Didn’t help, though. I then purged NVIDIA drivers entirely and reverted to nouveau, as they were causing issues in the past. Didn’t help either.
However, I realized when adding acpi=off to the GRUB entry and regenerating the config, Fedora would boot up without crashing, but only shows a blank screen. It would be let in via SSH, though.
Syslogs don’t contain anything useful, as the system most likely doesn’t even get close to the point where it could start writing logs.
Here’s a short video of the issue (sorry for poor quality): Video
The very last frame before the screen turns black again shows this:
Does not help, unfortunately. The system crashes way before I could obtain a text console, even before systemd get initialized. The only way to boot up the system (without graphics drivers, only single-core, etc.) is to disable acpi as described above.
In the meanwhile, I upgraded to Fedora 41 branched release, hoping that the upgrade would fix things. But it didn’t - same issue as before.
Also, I realized I still had systemd.unified_cgroup_hierarchy=0 on my cmdline - probably a leftover from using NVIDIA Docker - which would prevent Fedora 41 with systemd v256 from booting. Removing that parameter got me back to the same old symptoms.
During one boot, I got a kernel panic, but might be unrelated to the issue at hand.
Another is to try toggling certain BIOS settings like:
CPU virtualization (controls IOMMU)
IOMMU
Resizable BAR
Above 4G Decode
SATA AHCI/RAID mode (single-drives on Intel RST still do AHCI in RAID mode but somehow differently)
And for Intel CPU, if you try with IOMMU enabled (needs CPU virt too), also add the kernel option: intel_iommu=on
With NVIDIA graphics, I’d also try nouveau.modeset=0 (never had a hard-reboot but have seen distros lock-up with newer RTX without that prior to installing NV open/proprietary drivers)
I’m not exactly sure (I’ll think about it a bit more later), but:
If Fast boot in BIOS is enabled, perhaps Windows does something Windows-specific to the UEFI, and it doesn’t get reset on next boot for other OSs
Fedora’s Live image might do something to block EFI vars for booting compatibility? Or sets something specific to ACPI also for compatibility?
Something with Linux kernel or potentially GRUB related to UEFI was updated/changed
In the kernel panic image the RIP mentions smp which sounds related to the CPU and HT/SMT, so maybe it’s some kind of power-saving or CPU-related ACPI or UEFI change somewhere
Did you do a BIOS update or let fwupd do any? The change could be from that
Not sure if that’s related, but every time I boot into my Linux system after I was on Windows before, the clock is reset to UTC time (instead of +2) until it re-syncs after a couple of minutes.
No, but I realized I probably should - there are two security fixes available. I’m afraid that the updates will break things again, though.
I am not sure that is always the case. I am using windows 10, and have since it was released, on my laptop. The laptop was delivered with windows 8.1 and the RTC was set to UTC when I received it.
I dual boot fedora, and the RTC has always been set to UTC. The time zone adjusts the displayed time to local on both OSes.
For whatever weird reason, the problem came back all of a sudden, even with the 'Windows 2017' hack in place. A BIOS firmware update fixed it again. Super strange, though.
Not that unusual – Fedora makes changes that depend on BIOS following standards, but that triggers latent bugs in your BIOS which the vendor finally gets around to fixing. Linux has now reached a level at which many vendors can no longer afford to ignore bugs that only affect Linux.