F42 Change Proposal: Enable Drm Panic (system-wide)

:link: Enable Drm Panic

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Wiki
Announced

:link: Summary

Drm_panic is a new feature in the Linux kernel that displays a panic screen when a kernel panic occurs. This proposal is to enable DRM_PANIC in the Fedora kernel, to improve the kernel panic user experience.

:link: Owner

:link: Current status

  • Targeted release: / Fedora Linux
  • Last updated: 2024-07-11
  • [Announced]
  • [ Discussion thread]
  • FESCo issue:
  • Tracker bug:
  • Release notes tracker:

:link: Detailed Description

When the linux kernel panics in Fedora 40, in most cases, the screen just freezes. If you’re in a VT console, you’ll be able to see the kernel debug information, but that is pretty hard to understand for users that are not kernel developers. With this feature, they will see a message saying the computer has crashed, and they need to reboot the computer. Drm_panic has been introduced in kernel v6.10, but is still under active development.

In order to enable DRM_PANIC, you need to disable VT_CONSOLE in the kernel, this is to prevent a race condition, that if you are in a VT console when the panic occurs, both fbcon and drm_panic will write to the framebuffer at the same time, leading to corrupted output. DRI devel - Patchwork The drawback is that tty0 won’t show the kernel kmsg, and it can be harder to debug boot issues. But plymouth already takes care of this, and can display the boot kmsg when no VT console is present. https://gitlab.freedesktop.org/plymouth/plymouth/-/merge_requests/224 And the user experience would be better, because plymouth has better font and color support than fbcon.

Supported drivers are simpledrm, mgag200, ast, (and imx, tidss, on aarch64). I’m working on nouveau support, and I hope i915 and amdgpu will add support too. If the driver is not supported, you won’t see the panic screen, but it won’t be worse than what you have today.

Drm panic provides different panic screens. The default is “user” which will display a simple friendly message telling the user to reboot the computer. But for kernel developers, you can also set it to “kmsg”, to see the last kmsg lines (so this is equivalent to the current fbcon). You can select the panic screen in Kconfig, or as a module parameter (drm.panic_screen=user) or at runtime with “echo -n kmsg > /sys/module/drm/parameters/panic_screen”

I’ve also made a proof of concept to add a panic screen with a QR code with debugging information, which will make it easier for users to report kernel panic in Fedora. An example can be seen here: Test sample · Issue #1 · kdj0c/panic_report · GitHub

:link: Feedback

:link: Benefit to Fedora

This change will improve the user experience when a kernel panic occurs.

It’s also a first step to switch to userspace console, and being able to disable CONFIG_VT in the kernel. VT and fbcon are legacy part of the kernel, that would reduce maintenance burden if we can disable them, and It will also reduce CVE impact, as userspace vulnerabilities are usually less critical.

:link: Scope

  • Proposal owners:

Write documentation on how-to debug boot issues without VT_CONSOLE. Maybe also change the systemd log configuration, so that it default to writing the log to the console.

I’m unsure if it has impact on the installer.

  • Policies and guidelines: N/A (not needed for this Change)

  • Trademark approval: N/A (not needed for this Change)

  • Alignment with the Fedora Strategy:

I think it perfectly fit the “Fedora is for everyone” goal, as the current kernel panic (either UI freeze or kmsg output in VT) is not user-friendly.

:link: Upgrade/compatibility impact

Enabling DRM_PANIC should be transparent to user, but disabling VT_CONSOLE may have a visible impact. Fortunately since Fedora 40, plymouth is able to display the kmsg messages.

For non-graphical boot, you can use systemd.log_target=console systemd.log_level=info and remove rhgb and quiet to see the kernel boot message.

But this needs to be documented, and communicated, so that users that debug boot issues, know about this change.

:link: Early Testing (Optional)

Do you require ‘QA Blueprint’ support? Y/N

:link: How To Test

Currently the easiest way to test, is to use the simpledrm driver, as it can run on all hardware. So first blacklist your driver (i915, amdgpu or nouveau), and then boot and check that you’re using simpledrm. then you can trigger a kernel panic with: echo c > /proc/sysrq-trigger

As it will crash your machine, it’s also possible to do this in a VM (so disabling virtio-gpu, or vmwgfx)

Also to check that you can still see the kernel messages at boot, in the grub menu, remove the “quiet” kernel command argument, and you should still see the kernel boot messages on the plymouth screen.

:link: User Experience

With DRM panic, users will be notified that their computer crashed, instead of it being unresponsive.

With v6.10, it’s only for a few GPU drivers (simpledrm, mgag200, ast), but with simpledrm, it will already catch some common kernel panic cases, like root filesystem not found, or ramdisk corruption. (simpledrm is used at boot, and is later replaced with i915/amdgpu/nouveau 
)

It also prepares for future drm panic improvements, like having a kmsg panic screen, (should be available in v6.11) or also have better debugging information, using QR code. A test sample is shown at Test sample · Issue #1 · kdj0c/panic_report · GitHub

:link: Dependencies

The main dependency, is to have a kernel v6.10 or later. To still see the kernel boot messages, there is also a dependency on plymouth and systemd, but the versions in F40 are already good.

:link: Contingency Plan

  • Contingency mechanism: Revert the kernel configuration changes.
  • Contingency deadline: N/A (not a System Wide Change)
  • Blocks release? N/A (not a System Wide Change), Yes/No

:link: Documentation

Kernel Kconfig for DRM_PANIC: Kconfig - drivers/gpu/drm/Kconfig - Linux source code v6.10-rc7 - Bootlin Elixir Cross Referencer

:link: Release Notes

Last edited by @amoloney 2025-02-17T22:45:52Z

Last edited by @amoloney 2025-02-17T22:45:52Z

3 Likes

How do you feel about the proposal as written?

  • Strongly in favor
  • In favor, with reservations
  • Neutral
  • Opposed, but could be convinced
  • Strongly opposed
0 voters

If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.

We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what you’d like to express, please simply giving that post a :heart: instead of reiterating. You can even do this by email, by replying with the heart emoji or just “+1”. This will make long topics easier to follow.

Please note that this is an advisory “straw poll” meant to gauge sentiment. It isn’t a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.

1 Like

Very bold move to get rid of VT. Though I’m not personally against it, I believe there might be some users which rely on console switching. Does this proposal include a userspace replacement for the fbcon? (Such as kmscon, for example.)

1 Like

aka. bluescreen of death :smiley:

2 Likes

Not necessarily! :wink:

We could actually customize the panic screen downstream in Fedora! Panicking with style.

2 Likes

Ah, like this


6 Likes

I bet we have a themed bsod before we have a themed GRUB :joy:

1 Like

Disabling virtual terminals is a deal killer for me. Those can (and have) been very useful for troubleshooting all manner of problems such as when the DM fails for whatever reason or normal user accounts cannot sign in due to, e.g., problems mounting or decrypting their home directory. The VTs also provide an alternative to sudo (which there seems to be a movement to do away with).

2 Likes

This is only a first step to move away from VT, and it brings its own benefit.

To enable drm panic, you need to disable VT_CONSOLE, but you can keep VT and fbcon, and still switch to it using ctrl+alt+Fx.

VT_CONSOLE only prints the kernel message (and kernel panic message) to the console (fbcon), so that’s why it conflicts with drm panic.

I will do another change proposal later to move away from VT completely, but as you said, we need something to replace fbcon. There are different options, like kmscon (but it’s no longer actively maintained), or some micro wayland server with terminal app (like cage + foot).
Also it requires more work in a lot of different apps, like you can see in this detailed VT=n overview:
https://www.reddit.com/r/linux/comments/1dpeqay/config_vtn_in_2024/

8 Likes

I actually experimented with packaging kmscon in copr and
 yeah, it’s a mess.

https://copr.fedorainfracloud.org/coprs/jrelvas/kmscon/packages/

I need to use a fork of kmscon + libtsm and carry a questionable amount of patches to make it work properly. Considering the state of disrepair it fell into, it might be a better idea to start from scratch


I wasn’t aware of VT_CONSOLE being independent from VT, though! In that case, this should be a pretty straightforward change - as far as it goes for end users.

1 Like

You should be able to get the same debug output with the plymouth kmsg plugin, (or with systemd log if you boot in multi-users.target).

I tested it on a VM, and it worked, but I need to run more tests on real hardware.

Also if your root filesystem failed to mount, or if the ramdisk is corrupted, or if the kernel can’t find /bin/init, you will see that in drm panic itself.
(And you can use the kmsg panic screen when debugging such issue):
https://patchwork.freedesktop.org/series/134286/

Also fbcon (the console you use with ctrl+alt+Fx, and rescue consoles (the shell that drops when your fstab is broken) are still available, and won’t be affected by this.

1 Like

I do also remotely manage servers and I use serial over LAN to monitor, update, and reboot them. I have (not frequently, but maybe ~5 times in the last decade) had the kernel crash after an update. In the current configuration, I see a dump of error messages which can hint at what the problem is. What will happen with this new configuration? Will I still see those error messages? Will I be left with a blinking _ and wondering if I can/should hard reset the system with a power off command over IPMI?

1 Like

This shouldn’t affect serial consoles at all, to my knowledge.

1 Like

VT_CONSOLE is only for the graphic console, and won’t affect the serial console.

I’ve checked with a VM, and using the virsh console, you still have all kernel boot messages.
And it’s also what the Kconfig help says:

6 Likes

Strong Applause from me ! :clap:t5:

1 Like

Just to be sure, you booted the VM with -nographic and console=ttyS0,115200n8 on the kernel command line right? That should simulate interacting with the system over a serial port the way my serial over LAN systems are connected.

1 Like

For my tests, I just used console=ttyS0 in the kernel command line. (and I already have a F40, with a custom kernel with VT_CONSOLE=n, and DRM_PANIC=y).

Also I encourage people to run some tests, to spot broken workflow early, so we can find workarounds. There is still time to merge code before F42 is out, in May 2025.
I will build a test F40 kernel next week, so it will be easier to check by yourself, with your specific configuration.

2 Likes

I have a few commodity-hardware servers that don’t have serial ports, and I don’t want Plymouth to become mandatory; it has been the source of more issues for me than anything this change aims to address.

I’ve made a kernel build with drm_panic enabled (and VT_CONSOLE disabled):

https://koji.fedoraproject.org/koji/taskinfo?taskID=120544794

I’ve backported a few patches, that are not in v6.10, like:
short panic description:
https://patchwork.freedesktop.org/series/135356/
kmsg panic_screen:
https://patchwork.freedesktop.org/series/134286/
draft nouveau support (tested on 1650X, might be garbage on other GPU):
https://patchwork.freedesktop.org/series/133963/
And also virtio-gpu support, so you can test it easily on a VM.

To install the test kernel, download the rpms kernel, kernel-core, kernel-modules, kernel-modules-core, kernel-modules-extra, and install them with dnf.
It’s a Fedora rawhide kernel, but can be installed on F40.

i915, amdgpu, and nvidia are not yet supported, so you need to blacklist them, to use simpledrm, if you want to see the panic screen.

To trigger a kernel panic, if you don’t fear to crash your machine do this (as root):
echo c > /proc/sysrq_trigger

There is also a debug interface (assuming card0 and plane0):
echo 1 > /sys/kernel/debug/dri/0/drm_panic_plane_0

This will cause some flickering, because it will redraw only 1 of the 2 buffers (most of the time you have double-buffering), and may leak the framebuffer, but allows to see the panic screen without crashing.

You can also check the kmsg panic screen, to get the same debug info as before:
echo -n kmsg > /sys/module/drm/parameters/panic_screen

1 Like

In order to enable DRM_PANIC, you need to disable VT_CONSOLE in the kernel, this is to prevent a race condition, that if you are in a VT console when the panic occurs, both fbcon and drm_panic will write to the framebuffer at the same time, leading to corrupted output. https://patchwork.freedesktop.org/series/134831/ The drawback is that tty0 won’t show the kernel kmsg, and it can be harder to debug boot issues.

What happens to Magic System Requests output (Linux Magic System Request Key Hacks — The Linux Kernel documentation)? Today I can look into the kernel state by going to a virtual terminal, and then use magic SysRq to for example dump the backtraces of running CPU, or dump the ftrace buffers, and so forth. Where would this output go? To the VT or into some buffer I can’t view without an actual panic?

1 Like