I am a complete noob on these kinds of things.
I have never encountered such an issue.
I am on yesterday’s version of F42 Kinoite. The one from the day before is also still pinned. For both, I get to the decryption passphrase input box and when input it, the system fails to boot. It booted fine yesterday evening.
Failed to connect to a system scope bus via local transport: No such file or directory
dracut-cmdline: Input ‘luks-…’ is not an absolute file system path, escaping is likely not going to be reversible.
dracut-pre-trigger: Failed to send request to update environment: No such file or directory
ostree-prepare-root: Couldn’t find specified OSTree root ‘/sysroot/…’: No such file or directory
My pc tells me to type journalctl.
What I see in journalctl that seems relevant
Linux version: 6.15.3-200.fc42.x86_64
Kernel command line: BOOT_IMAGE…
Next line: Unknown kernel command line parameters “rhgb BOOT_IMAGE…”, will be passed to user space
Then much lower in the log
ostree-prepare-root.service: Main process exited, code=exited, status=1/FAILURE
…
Failed to start ostree-prepare-root.service…
Can anybody help me? Is this recoverable? I hope because the last backup I ran is last week and since then I have done some work I’d need back…
Could I perhaps reinstall system without deleting the stuff outside the immutable (layers, containers)?
To be on the safe side, better boot with a Fedora Live ISO (Workstation or KDE), and back up your work (e.g. with rsync -a), given that it probably resides inside your home folder.
Not that easy if you went with automatic partitioning, and the home folder is a subvolume on the root partition. You could hang around, hoping someone joining in will be able to suggest a fix to your issue, but from experience it’s harder to troubleshoot when deployments that did work before don’t boot anymore.
Should you need to reinstall, you could first back up the whole home folder, as well as enter the latest deployment and also back up relevant config files in etc, in case you’ve changed them manually. I think the path should be something like /<your-mount-point>/<your-partition-name>/root00/ostree/deploy/fedora/deploy/<your-deployment-name>/etc/. That way, it’s faster to recreate the state of the previous system.
Thank you, I will try as you say with MX Linux live system I have lying around here.
I think I will need to reinstall, as the system immediately enters emergency mode after inputting decrypt passphrase.
Lesson learned: always rigorously back-up.
Question: how do I know that when do a fresh install of Fedora Kinoite, I will not just bump into the same issue?
I’d say you’ve experienced a corner case, which doesn’t really usually happen. Atomic desktops are more resilient to boot failures than traditional systems.
Did you have a power issue maybe while the system was updating? Are you taking BTRFS snapshots and the disk could be full? Anything else you could think of?
Honestly, no.
I switched off the machine after midnight, it was working fine, and wanted to continue work this morning. I was greeted with the current situation.
I hope it is/was a corner case because I switched to immutable exactly because it should be more resilient.
I’m not taking BTRFS snapshots and the disk isn’t full (currently backing up my /home and /etc over live system as you suggested).
I do have an AMD processor and graphics card. Maybe it has something to do with it, as I came across reported issues while trying to find info about my errors. But then again, how can it work and 8 hours later don’t work…?
I will reinstall and see what happens…
Question: what is the better backup strategy? BTRFS or rsync -a ? Thank you!
I would suggest going with redundancies and using multiple backup solutions:
Simple backup (with rsync -avz) from time to time, on an external medium which is only connected/powered on when backing up;
Scheduled incremental backup, preferably encrypted, to a remote location. There are a few nice GUI tools available (e.g. DejaDup or Pika as GTK apps, but there should be Qt apps as well) which handle this task well, including the scheduling part etc. This solution could also save you from the headaches caused by accidentally deleted files, in case such files would be there on a previous backup instance.
Syncing of specific folders between multiple stations (optional). A great tool for this purpose is syncthing, available in the repos, and works best if there’s (at least) one always-on system (e.g. home server). This is not a backup solution per se, but rather a syncing solution, yet it can provide the latest (current) state of the backup folder, whereas the last backup could be a few days old. It can even be used to sync files/folders with a smartphone (there are both iPhone and Android apps available).
BTRFS snapshots, while a great technology, are not a backup solution as such (but rather images of the subvolume state at some point in time, based on the copy-on-write concept). Additional steps need to be taken to use them as a real backup solution, and easy-to-use tools for the larger audience are missing. A nice Fedora Magazine article that describes the necessary steps can be found here, or you can give the BTRFS Assistent GUI app a try.
Hello, meanwhile I reinstalled everything as this was a production system. It looked to me that this was a serious issue and reinstalling was the only sensible thing to do. And this was confirmed by the answers I received from @tqcharm (thanks!), so I went ahead, copied my data using live system and reinstalled Kinoite.
The only thing I can think of, and I’m just guessing here, is that there was very bad weather with a lot of lightning going on. Maybe just a really short power flicker caused something to go rogue? As I said: just guessing. Otherwise I didn’t do anything different to any other time I updated Kinoite: install updates, switch off, reboot in the morning in new deploy.
I wonder if a power failure at some specific stage of the upgrade process could cause such a “mismatch” issue (maybe similar to this one). Could this be Achilles’ heel of atomic desktops?
This is really weird as ostree is specifically designed to not fail on power failure / unexpected shutdown: Atomic Upgrades | ostreedev/ostree
You can turn off the power anytime you want…
OSTree is designed to implement fully atomic and safe upgrades; more generally, atomic transitions between lists of bootable deployments. If the system crashes or you pull the power, you will have either the old system, or the new one.
So I’m suspecting there is bug here somewhere as this kind of reliability is a core part of the Atomic Desktops.
Since this sounds pretty serious, if it would be useful, I can try to reproduce these circumstances on a bare metal machine. I’m thinking of installing Kinoite, running the upgrade, and then doing a hard reset, powering off, turning off PSU from the knob, and unplugging the power cord from the outlet during the various stages of the process.
I was thinking about the same, running some tests over the w/e, but rather in a VM.
I’m just wondering at what stage should the power failure be simulated (post-scripts, post-trans scripts maybe). I also wonder if it only happens when new kernels are being deployed and initramfs is being generated.
I don’t know much about bootupd, but could it be that it is involved in automatic updates in F42 Atomic?
bootupd does not yet perform updates in a way that is safe against a power failure at the wrong moment, or against a buggy bootloader update that fails to boot the system.
Therefore, by default, bootupd updates the bootloader only when manually instructed to do so.
So, this is no longer exactly true (this is from the README? I need to update that). For EFI, bootupd updates are really close to being really safe. For BIOS there is still a small gap.
But note that this can not be the case here, as a bootupd update failure would mean that your system would not boot at all, i.e. you would not even reach GRUB.
If you get a GRUB menu, this is not a failed update from bootupd.
Wait ~1-2 seconds
Push the reset button on the front panel
Repeat 2 times
sudo rpm-ostree cleanup --rollback
Wait ~0.5-3 seconds
Push the reset button on the front panel
Repeat 2 times
sudo rpm-ostree rollback --reboot
Wait ~1-2 seconds
Push the reset button on the front panel
Repeat 2 times
In each case, the machine booted either in the current deployment or in the updated one, in case I wasn’t quick enough with the reset. These are the deployments I have tested.
Since I’m not familiar with disk encryption, I don’t know if it could possibly be related to the issue. I will start encrypting the test environments I install.