Fedora crashing and hard-rebooting immediately after GRUB selection

The following problem has occurred entirely random, without me having done any updates or something and I didn’t manage to fix it yet.

I’m using Fedora 40 in a dual boot setup with Windows 11, managed by GRUB. Recently, when I’m trying to boot into Fedora, my machine would crash and hard reboot immediately after I choose the “Fedora” boot entry in GRUB. Windows (as well as a live disk) work just fine.

I booted a live system, mounted the Fedora file system and chrooted into into it, then ran dnf update to make sure all packages are at their latest versions. Didn’t help, though. I then purged NVIDIA drivers entirely and reverted to nouveau, as they were causing issues in the past. Didn’t help either.

However, I realized when adding acpi=off to the GRUB entry and regenerating the config, Fedora would boot up without crashing, but only shows a blank screen. It would be let in via SSH, though.

Syslogs don’t contain anything useful, as the system most likely doesn’t even get close to the point where it could start writing logs.

Here’s a short video of the issue (sorry for poor quality): Video

The very last frame before the screen turns black again shows this:

Any ideas how to further debug this?

A couple of things to try.

Can you boot into single user mode? Add single to the grub command line and remove quiet and rhgb.

Can you use Ctrl-Alt-F2 to get to a text console?

Does not help, unfortunately. The system crashes way before I could obtain a text console, even before systemd get initialized. The only way to boot up the system (without graphics drivers, only single-core, etc.) is to disable acpi as described above.

In the meanwhile, I upgraded to Fedora 41 branched release, hoping that the upgrade would fix things. But it didn’t - same issue as before.

Also, I realized I still had systemd.unified_cgroup_hierarchy=0 on my cmdline - probably a leftover from using NVIDIA Docker - which would prevent Fedora 41 with systemd v256 from booting. Removing that parameter got me back to the same old symptoms.

During one boot, I got a kernel panic, but might be unrelated to the issue at hand.

Added nvidia

F40 and F41 use also the same kernel. That was not llikely to fix things.

Can you boot up a USB live image?

That is not supported (or about to be not supported) by systemd.

What else do you have on the kernel command line?

Yes, as I said, live system works fine. I used it to get a chroot.

Only rhgb, nothing else.

One idea is to force the ACPI OSI to Windows: Fedora 40 kde battery percentage not updating/always in charge - #19 by Espionage724

Another is to try toggling certain BIOS settings like:

  • CPU virtualization (controls IOMMU)
  • IOMMU
  • Resizable BAR
  • Above 4G Decode
  • SATA AHCI/RAID mode (single-drives on Intel RST still do AHCI in RAID mode but somehow differently)

And for Intel CPU, if you try with IOMMU enabled (needs CPU virt too), also add the kernel option: intel_iommu=on

With NVIDIA graphics, I’d also try nouveau.modeset=0 (never had a hard-reboot but have seen distros lock-up with newer RTX without that prior to installing NV open/proprietary drivers)

Can you boot on an older kernel?

@Espionage724 Indeed adding acpi_osi='Windows 2017' as a kernel boot parameter solved the issue and got my PC back to work again! Thanks a lot! :raised_hands:

Can you explain what’s behind this? And do you have a clue how this problem could occur out of nowhere in the first place?

1 Like

I’m not exactly sure (I’ll think about it a bit more later), but:

  • If Fast boot in BIOS is enabled, perhaps Windows does something Windows-specific to the UEFI, and it doesn’t get reset on next boot for other OSs
  • Fedora’s Live image might do something to block EFI vars for booting compatibility? Or sets something specific to ACPI also for compatibility?
  • Something with Linux kernel or potentially GRUB related to UEFI was updated/changed
  • In the kernel panic image the RIP mentions smp which sounds related to the CPU and HT/SMT, so maybe it’s some kind of power-saving or CPU-related ACPI or UEFI change somewhere
  • Did you do a BIOS update or let fwupd do any? The change could be from that
1 Like

acpi-kernel parameters: how to choose them (2021-03)

acpi_osi=linux on my Dell systems reduced the number acpi error messages.

3 Likes

Not sure if that’s related, but every time I boot into my Linux system after I was on Windows before, the clock is reset to UTC time (instead of +2) until it re-syncs after a couple of minutes.

No, but I realized I probably should - there are two security fixes available. I’m afraid that the updates will break things again, though.

It’s not exactly related to UEFI, but Windows does local time and Linux does UTC and both do auto-sync on boot and write it to RTC.

Iirc there’s something about Windows below 10 that’s odd with switching it to UTC, but for 10/11 I set it to UTC:

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation" /v "RealTimeIsUniversal" /t REG_QWORD /d "1" /f

And re-set Linux back to UTC (Fedora sets it to local if it detects Windows during install):

sudo timedatectl set-local-rtc '0'

I am not sure that is always the case. I am using windows 10, and have since it was released, on my laptop. The laptop was delivered with windows 8.1 and the RTC was set to UTC when I received it.

I dual boot fedora, and the RTC has always been set to UTC. The time zone adjusts the displayed time to local on both OSes.

1 Like

Have got fast boot enabled in Windows?

Issuing security fixes means bad actors will soon be trying to exploit unfixed systems. The result can be much worse problems than failed updates.

For whatever weird reason, the problem came back all of a sudden, even with the 'Windows 2017' hack in place. A BIOS firmware update fixed it again. Super strange, though.

Not that unusual – Fedora makes changes that depend on BIOS following standards, but that triggers latent bugs in your BIOS which the vendor finally gets around to fixing. Linux has now reached a level at which many vendors can no longer afford to ignore bugs that only affect Linux.