An A.I. analysis on the boot sequence

Hello, today, I decided to show Gemini some of the issues I’ve observed while booting. It identified several of them and created an analysis. Subsequently, it helped me fix them.

I wanted to include the report here and ask of the community two things:

  • Please, give it a read and some consideration. Let me know if you think it lacks value.
  • Voice your opinion on it and let me know if you think I should report/fix the bugs.
==============================
Fedora 43 Initial Bug Analysis
==============================

Analysis of the journal since boot on Fedora 43 (Kernel 6.19.11) reveals several systemic issues that warrant reporting to the
Fedora community. These findings are prioritized based on their impact on system performance and boot hygiene.

irqbalance vs. Managed Interrupts
---------------------------------
irqbalance (version 1.9.4) repeatedly attempts to modify CPU affinity for "managed" interrupts, which results in persistent
`Permission denied` errors in the journal. These interrupts, particularly for NVMe devices (MSI-X), are managed by the kernel or the
driver itself, and manual affinity changes are restricted.

* **The Bug**: irqbalance should detect interrupts with the `IRQ_ALLOCATED_COMBO` flag and skip them automatically.
* **What didn't work**: Attempting to manually change affinity via `/proc/irq/` also returns `Permission denied`, confirming these
  are kernel-managed.
* **Local Workaround**: Set `IRQBALANCE_ARGS="--banmod=nvme"` in `/etc/sysconfig/irqbalance`.
* **Why Report**: This fills the journal with thousands of repetitive errors on modern NVMe-heavy systems, degrading the overall
  user experience and making other boot issues harder to identify.

Example log output:

   .. code-block:: text

      Apr 12 10:23:21 desktop.casa.g02.org irqbalance[1079]: Cannot change IRQ 91 affinity: Permission denied
      Apr 12 10:23:21 desktop.casa.g02.org irqbalance[1079]: IRQ 91 affinity is now unmanaged

systemd-sysctl vs. Module Loading Race (initrd)
-----------------------------------------------
The system reports errors when attempting to set TCP congestion control to `bbr` early in the boot sequence because the required
`tcp_bbr` module is not yet loaded. This issue persists into the `initrd` phase if the configuration was present during the last
`dracut` run.

* **The Bug**: When a sysctl configuration (e.g., in `/etc/sysctl.d/`) depends on a specific kernel module, there is no automatic
  mechanism to ensure that module is loaded before `systemd-sysctl` runs.
* **What didn't work**: Simply removing the stale `kernel.sched_migration_cost_ns` from `/etc/sysctl.d/` did not stop the early boot
  error, as the configuration was cached within the `initrd`.
* **Local Workaround**: Add `tcp_bbr` to `/etc/modules-load.d/bbr.conf` and rebuild initramfs with `dracut -f`.
* **Affected Configuration**: `net.ipv4.tcp_congestion_control = bbr`
* **Why Report**: While settings are often applied correctly later by services like `tuned`, the initial boot error is a "false
  positive" that suggests a configuration failure where none exists.

Unset Environment Variable Warnings in Core Units
-------------------------------------------------
Several core systemd units report warnings for referenced but unset environment variables.

* **The Bug**: Systemd units for these services should use the `-` prefix for optional environment variables (e.g., `${-VAR}`) or
  ensure that the environment files provided by the packages define these variables, even if they are empty.
* **What didn't work**: Relying on default package configurations; these warnings persist across reboots even on clean installs if
  certain optional features are not used.
* **Local Workaround**: Manually defining the variables in `/etc/sysconfig/` or `/etc/default/` files to silence the warnings.
* **Affected Units and Variables**:
  * `sshd.service`: `CRYPTO_POLICY`, `SSHD_OPTS`
  * `lm_sensors.service`: `BUS_MODULES`, `HWMON_MODULES`
  * `v4l2-relayd@.service`: `SPLASHSRC`
  * `atd.service`: `OPTS`
  * `irqbalance.service`: `IRQBALANCE_ARGS`
* **Why Report**: These warnings clutter the journal and violate Fedora's goal of a clean, warning-free boot experience.

`Add `tcp_bbr` to `/etc/modules-load.d/bbr.conf` and rebuild initramfs with `dracut -f`.`

IMO, the change should be the other way. We should strive to have less in the initramfs, not bloat it with ever more junk that shouldn’t be needed by 99% of the systems.

Systemd units for these services should use the `-` prefix for optional environment variables (e.g., `${-VAR}`)

I’ve never seen anything like ${-VAR} anywhere before. I cannot find an example of it under /usr/lib/systemd/system:

$ grep -r -- '${-' /usr/lib/systemd/system/*
$ 

And I don’t see any mention of it in the systemd.syntax man page.

$ man systemd.syntax | grep '${-'
$ 

Edit to add:

Manually defining the variables in `/etc/sysconfig/`

I agree that defining the vars in drop-in files under /etc/sysconfig by setting them to empty strings just to silence the warnings would be a reasonable thing to do.

2 Likes

I agree actually. A less bloated initrd is preferable. Yet, there is some kind of race condition causing this. Adding that module load configuration to the initrd was the workaround.

Yep, that was a hallucination most probably. It does mention `Environmentfile=` which supports that kind of syntax to prevent failures.

“… To make the file optional, prefix the path with “-”, which causes all errors related to the file to be silently ignored…”

Thank you. :slight_smile:

1 Like