[RPI4] ethernet related kernel panic

$ cat /proc/cpuinfo
Hardware	: BCM2835
Revision	: a03111
Model		: Raspberry Pi 4 Model B Rev 1.1
  • 1GB variant
  • official pi power adapter
  • active cooling
  • ssd for storage, using a known good usb-to-sata adapter
  • stable on dietpi (raspbian / debian 12)
  • stable on ubuntu server 23.04 (just tested)

however, on an up-to-date fedora 38 (minimal) install:

  • if i’m using the onboard nic, it’ll boot up, reach the login prompt, then ~30 seconds later, it’ll kernel panic. i can login or even ssh into it before it happens.
  • if i swap over to a USB3 nic, no issues and it’s been rock solid, which should rule out userspace.

at this point, i’m starting to think it’s my board revision because whenever i tried booting fedora aarch64 images in the past, they would all kernel panic. i just happened to try again after the release of 38, ended up using the minimal image, made it through a dnf update and it was working fine until just recently.

anyone else running fedora on a rev 1.1 board? i’m open to try other suggestions before opening a bug report.

What image and installer did you use? I’ve been using Minimal as below procedure;

$ sudo arm-image-installer
–resizefs --target=rpi4 -y

  • --image: Set to the downloaded image
  • --media: Set to the device ID of your SD card
  • --addkey: Use to copy an [SSH public key] for authentication
  • --norootpass: Do not allow root login using a password
  • --resizefs: Resize the filesystem to occupy the entire SD card
  • --target=rpi4: Install the appropriate bootloader for Raspberry Pi 4
  • -y: Don’t ask for confirmation

38 minimal and arm-image-installer, basically the same arguments as you posted.

I used nmcli to connect wifi, not ethernet, so can’t replicate your issue.

Could anyone help interpret screen dump on boot-up screen and debug?

updated to kernel 6.4.6, same issue but here’s some even more weirdness:

  • boot it up without any networking
  • wait a minute
  • plug in a network cable to the onboard nic
  • network comes up
  • restart my pod (containers) and another service that listens on the static ip
  • it appears to be working fine
  • i even successfully rebooted it without issue
  • however, if i shutdown, remove usb-c power then cold boot, i get kernel panics

part of me wants to believe the board is bad but it literally works fine on the debian based distros. it’s just something to do with fedora but i can’t quite narrow it down. i’ll retest the other aarch64 images along with the custom centos stream 9 image from pablo greco over the weekend and report back.

I am guessing this may be a timing issue in starting services.

On my desktop it seems I may have had a failing wifi adapter that was disconnecting then reconnecting at about 5 minute intervals. It was causing kernel hangs and machine checks at random times and stopped when I replaced the adapter. The apparent cause was interruption of communications for some apps that did not know what to do and just hung with a stuck cpu.

Similar things can happen if there is a timing issue with booting and services are conflicting during startup. You said it seems to be an ethernet related hang, so it seems a potentially similar problem.

The only way I know to be certain would be to make sure one is able to see the related logs and/or the kernel messages on screen during boot so one could find out exactly what the cause was.

that thought crossed my mind but as far as i can remember, i was getting kernel panics on fresh, untouched fedora images as well. that’s why i need to retest to confirm that it’s not PEBCAK.

might need a slow motion camera for that, lol.

I have an RPi4B and have never seen a kernel panic with any boots. Have used F37, F38 workstation, server, and more in testing.

However, I know that sometimes things start out of order, which is why the systemd management allows for specifying what is required before and after in the way the service and target files are written.

Simply looking through some of those files in /lib/systemd/system/ can show this. Specifically, as one example, one might look at network.target, network-pre.target, and network-online.target in that directory as well as NetworkManager-wait-online.service to see the timing specified within the files.

1 Like
sudo arm-image-installer \
	--target rpi4 \
	--addkey ~/.ssh/id_ed25519.pub \
	--norootpass \
	--resizefs \
	--media /dev/sdb \
	--image Fedora-Server-38-1.6.aarch64.raw.xz
  • checksum verified
  • network cable plugged in = instant kernel panic
  • network cable unplugged = anaconda text installer

on the 1st boot, i didn’t have my phone next to me but before the messages forced the screen to scroll, i noticed one along the lines of “memory address in between userspace and kernel”. picture is of the 2nd boot.

so it’s definitely looking like it’s something with “out of the box” fedora images and potentially limited to just the 1GB Pi4 models. i can’t remember who or where but i vaguely recall reading that fedora iot(?) testing was done on an 8GB model. hopefully it’s just a config tweak that’s needed.

# https://people.centos.org/pgreco/CentOS-Userland-9-stream-aarch64-RaspberryPI-Minimal-4/
# fails to mount for --addkey / --norootpass (related to swap partition?)
sudo arm-image-installer \
	--target rpi4 \
	--addkey ~/.ssh/id_ed25519.pub \
	--norootpass \
	--resizefs \
	--media /dev/sdb \
	--image CentOS-Userland-9-stream-aarch64-RaspberryPI-Minimal-4-sda.raw.xz

also works fine… might be bug report time.

I found out that for fedora server on arm the root file system is xfs. The arm image installer is not able to resize an xfs file system during install and that must be done after first boot using xfs_growfs

I also suspect the issue with failing to boot on a 1 GB RPi4 may be related to lack of adequate memory. I have only tested on 4 GB & 8 GB models.

I tested the minimal on 2GB model with no problem.

$ free -m
               total        used        free      shared  buff/cache   available
Mem:             921         252         406           0         262         591
Swap:           8095         321        7774

once it’s up and running in fedora, it hums along at ~300MB as it’s just dns and a pod (gitea + syncthing + traefik).

and systemctl reboot works fine, no panics on (re)boot. it’s just after it loses power with the network cable plugged in… which has me thinking that it MIGHT be something to do with fedora not using the typical /boot/config.txt like you see in other distros. maybe some values are hard coded but need to be adjusted for the 1GB model: CTRL + F + "memory address"

$ sudo arm-image-installer \
        --target rpi4 \
        --addkey ~/.ssh/id_ed25519.pub \
        --norootpass \
        --resizefs \
        --media /dev/sda \
        --image Fedora-Workstation-38-1.6.aarch64.raw.xz

same deal. remove the network cable, it’ll boot and i can complete the initial setup but fail to login to gnome because 1GB ain’t enough.

now we can actually see where in the boot process it’s panicking. noticing a few duplicates:

then notice the ? characters, possible memory corruption due to overlapping ranges?

I think that was my point. During boot more memory is required than is normal after the boot completes since one must have the initramfs image as well as the kernel and active boot processes loaded into memory before the main file system is activated.

On my RPi4 (fedora workstation) the initramfs for the default kernel (6.2.9) is significantly larger than the initramfs once the system has been updated with a newer kernel.
I see similar with the fedora server image.

$ ls -l /boot/init*
-rw-------. 1 root root 99734788 Apr 13 17:09 /boot/initramfs-6.2.9-300.fc38.aarch64.img
-rw-------. 1 root root 33616285 Jul  6 14:48 /boot/initramfs-6.3.11-200.fc38.aarch64.img
-rw-------. 1 root root 31770064 Jul 13 16:41 /boot/initramfs-6.3.12-200.fc38.aarch64.img

Note that the initramfs for kernel 6.2.9 is at least 3 times the size of the one for either of the later 2 kernels.

It is also recommended that one have a minimum of 2GB ram for fedora.

yea, i had no expectations workstation w/ gnome was going to work but just wanted to see how it behaved when it actually ran out of RAM. it still makes it to GDM but you get a user prompt->login attempt->crash->user prompt loop, alternative TTYs work fine though.

# minimal install i've been using for awhile
$ ll /boot/initramfs-6.*
-rw-------. 1 root root 18M Jul 15 03:53 /boot/initramfs-6.3.12-200.fc38.aarch64.img
-rw-------. 1 root root 18M Jul 24 04:47 /boot/initramfs-6.4.4-200.fc38.aarch64.img
-rw-------. 1 root root 18M Jul 28 22:23 /boot/initramfs-6.4.6-200.fc38.aarch64.img