Help with feature request for more verbose boot messages

Good morning,

I’d like to submit a feature request, but I’m not sure to what component this would apply.

Yesterday, I copied my root and boot filesystems to a new drive. I used rsync -aX to perform the copies, so I thought I had taken care of SELinux contexts. When I tried booting from the drive, the console would display a lot of failures from systemd-journald.service and other systemd components. These error messages instructed me to use “systemctl status ” to see the reason for the failure. The kernel would then panic and hang.

Because of the panic & hang, I couldn’t run systemctl status to see actual error messages. Whatever the error was, it prevented mounting root in rw mode, so I couldn’t boot from alternate media to read the logs.

I tried adding boot_delay=100 to the kernel parameters, but that hung the system after a single line. I tried taking a video of the console while booting, but I didn’t see anything other than systemd components failing to start. No reason why given, just the fact they failed.

I spent several hours googling for things like “systemd.journald failed on boot”, but didn’t get anything that helped. I finally just assumed that I was dealing with an SELinux problem and did the 'touch /.autorelabel" trick while booted from alternate media.

This has happened to me twice in the past few years. I don’t like trying random changes to see if they fixed the problem. If I’d seen a console message that said “SELinux context error”, I would have tried autorelabel immediately. All I saw was “Failed to start systemd-”, however.

The request I’d like to make is to add one or both of these features:

  1. As soon as possible in the boot sequence, check the SELinux context for all required files. If any of them fail, display the names of the files that failed and why. Then, either pause the boot for a few seconds or just stop booting, either of which will allow users to take a picture or read the messages.

  2. If any error occurs, don’t just print “Failed, see systemctl status”. Write the log message to the console and pause or stop the boot.

Basically, anything that actually lets folks read the error messages when journald doesn’t start or root can’t be [re]mounted rw.

I know one can hook up a serial console, but how many non-enterprise folks have computers with serial ports these days, much less a serial console handy?

Thanks,

Matt

I suspect what is needed is already present. You just need to tweak the defaults. Don’t list quiet or rhgb in the kernel parameters. Maybe add loglevel=<something higher> to the kernel parameters. You might also need to disable auditd to prevent it from routing those messages to /var/log/audit. Maybe changing the values for kernel.printk would be needed as well.

SELinux contexts may be fine. Copying to a new drive normally involves a new UUID for the device partition(s) and thus the lines in /etc/fstab may no longer be proper as well as the hardware info within the initramfs image used for booting may now be invalid.

No, the problem I described was solved by the touch /.autorelabel trick. That relates to SELinux, correct?

Before having that issue I did have the UUID issue you suggest, but that error generates a nice clear error message from GRUB: roughly, can't find filesystem with UUID <uuid>. Since GRUB can’t do anything else, it gives the grub boot command line prompt, with the added effect that the error message doesn’t scroll off the screen. It did take me a few tries to get that resolved, mostly because (after booting from alternate media in order to mount my partitions) I needed to figure out how to mount sys, dev, proc, and run/systemd in my filesystems before chrooting to it. Then I could safely run grub2-install <device>, grub2-mkconfig -o <config-file>, and dracut --regenerate-all -f.

I ended up documenting my procedure afterward. I’ll use these as reminders for the next time I replace a root drive. (If anyone else wants to use these notes, do not just run the commands. Make sure you understand what each command does and, if needed, modify them to be appropriate for your situation before executing them.)

1 Like

@thearcher, it looks like an interesting idea. Would it work for you if it was just a report like this with no action? The messages color is red.

Mar 19 15:51:14 luggage systemd[1]: Starting selinux-check-critical-files.service - Check if SELinux labels of critical files are correct…
Mar 19 15:51:15 luggage restorecon-status[1074525]: Would relabel /usr/bin/x-bash from system_u:ob
ject_r:shell_exec_t:s0 to system_u:object_r:bin_t:s0
Mar 19 15:51:15 luggage systemd[1]: selinux-check-critical-files.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 19 15:51:15 luggage systemd[1]: selinux-check-critical-files.service: Failed with result ‘exit
-code’.
Mar 19 15:51:15 luggage systemd[1]: Failed to start selinux-check-critical-files.service - Check i
f SELinux labels of critical files are correct.
Mar 19 15:51:15 luggage systemd[1]: selinux-check-critical-files.service: Triggering OnFailure= de
pendencies.
Mar 19 15:51:15 luggage systemd[1]: Starting selinux-report-critical-files.service - Report that i
mportant SELinux filesystem labels are incorrect…
Mar 19 15:51:15 luggage echo[1074530]: Wrong SELinux labels of critical files.
Mar 19 15:51:15 luggage echo[1074534]: Run “journalctl -u selinux-check-critical-files.service” to
get more information about the reported files.
Mar 19 15:51:15 luggage echo[1074537]: Run “fixfiles onboot” and reboot to fix the reported issues
.
Mar 19 15:51:15 luggage systemd[1]: selinux-report-critical-files.service: Deactivated successfull
y.
Mar 19 15:51:15 luggage systemd[1]: Finished selinux-report-critical-files.service - Report that i
mportant SELinux filesystem labels are incorrect.

That looks good. The only confusion might be the suggestions to run journalctl or fixfiles, as the SELinux issues can hang the system. I only have the one test sample (my system), so I don’t how often an SELinux issue with a critical file will hang a system. Am I the lucky one? Does it happen if there’s a critical file SELinux issue on any system? If it’s more than just a few, would it be worth suggesting folks search online for ‘touch autorelabel’?

Thank you!