Greetings,
I have been using VMware on Windows Server 2022 for a while now installing Fedora Workstation 41 with BTRFS. I have encountered multiple failures during unclean shutdown because of sudden power loss.
open_ctree failed, I have faced this issue two times and had to re-install the OS again on VMware, but I faced the same issue again after our server suddenly shutdown because of power failure. I have setup regular backups so I won’t lose the database and applications data, but configuring everything is tiresome.
Should I stop using BTRFS on a virtual environment and instead use XFS or ZFS? or Should I change the virtualization tool I am using, Vmware? I couldn’t tell which part I should be working on to improve this redundant issue.
I really appreciate any suggestions regarding the two cases, the Virtualization tool and FS.
Thanks.
Given a sudden lose of power and the added latency of a virtualisation layer you will find that any file system is going to fail. Fixing the power issue is where I would focus. Maybe you need a UPS to protect the server?
Maybe use ext4 as it seems quite resilient to power failure issue and is simple to fix.
As for using ZFS, it will add to your troubles as it uses an out-of-tree module which will need to be rebuilt every time you update the kernel.
Thanks for your suggestion. We have basic UPS for now, which don’t last long, and let’s say if the power fails at night where we can not physically shutdown the server, UPS doesn’t guarantee for protecting the VM from FS failure.
What I am trying to understand is, BTRFS with two SSD storage with RAID configured on my Laptop doesn’t fail even if I forcibly shutdown the laptop or for any ungrateful shutdown. But here on Virtual Machine (Currently using VMWare on Windows Server), if the host encounters any power loss, the Fedora Workstation running on VMware will not survive.
Why is the FS is behaving differently on VM and the actual computer?
is ext4 a better chose for server setup? Fedora server comes with XFS by default as far as I know, but currently I am choosing BTRFS for it’s features, but I am thinking if there is a better virtual machine software which handles such issues a better way.
This is entirely up to the system owner. They are all supported filesystems that have journaling and should generally be able to recover from the occasional unsafe shutdown, but I agree with others here that if it’s happening more than very occasionally, the solution isn’t changing file-systems but by fixing your power/shutdown cycles.
(Anecdotally, I’ve had to manually recover XFS far more than ext4 or btrfs from disruptive events.)
@barryascott The UPS doesn’t support that feature, but I will try to make some tweaks on VMware myself.
But, I wanted to know the reason why Fedora is failing on Virtual Environment while it doesn’t on workstations computers. As I mentioned, I have tried to force shutdown my laptop multiple times, but the OS (Fedora) hasn’t failed to boot even once. Why doesn’t that apply to running it on virtual machines?
This is what I am faced with multiple times when the host computer shuts down suddenly. I couldn’t fix it and the only choice I have is reinstalling the OS, but there is no guarantee not to face it again.
Odd I thought all UPS, at lesst on Windows, would have software you can install on windows to do the shutdown on powerfail.
Suggest you replace the UPS with one that does support shutdown feature.
I would guess because the linux kernel is told by VMware that it has written data safely to disk, but VMware has not actually done the write.
You would need to ask VMware expert to be sure.
Thanks, @barryascott, for the suggestion. I have posted the issue on the VMware community. I am looking into Proxmox to see if it’s a better solution. I think running Linux on VM where Windows is the host OS is not a good option if you don’t have a good power backups I guess.
This really isn’t a Windows or Linux specific issue. It’s not really even a VMWare one. Modern journaled filesystems have some tolerance and recoverability for very occasional improper shutdowns, but they aren’t designed for that to be the regular norm. That’s not good for for Linux guests or your Windows host.
That said, the Atomic variants of Fedora (Kinoite, IoT, FCOS, etc.) have a read-only root volume, so they are more likely to survive. But that said, modern computers aren’t designed for frequent power interruptions to be the norm. That worked for many machines in the early 1980s that booted from ROM and data was stored on a tape or floppy.
Addendum: XFS and ext4 often do a delayed disk write so data may get written to memory first and written to the drive over time for performance reasons (among other things, synchronous ext4 writes are single-threaded), but data that isn’t sync’d will be lost in power outage events. Data can be directly written to disk with the sync mount option for ext4 where xfs has a tunable sync threshold that you probably should never touch. One of the things that happens on normal shutdown is a filesystem sync so that all pending disk operations get written before unmounting. Also note that this doesn’t apply to btrfs since writes are currently already synchronous.
@vwbusguy thank you so much for clearing out for me.
The reason I am confused is that I have an HP workstation with a better CPU and Storage, which has been running a Fedora Server with XFS for over 4 Years now.
I have never had a failed OS within the past 4 years, even after multiple power failures, that’s why I am suspecting VMware or BTRFS is at least doesn’t tolerate such issues. Your explanation makes it clear, I guess; if Btrfs writes in synchronous mode, it certainly corrupts the whole tree, which then causes the OS not to boot.
I am thinking of using Proxmox if it is a great way of implementing my scenario: running Windows Server along side Fedora Server each separately.
I think you got the exact opposite of the point I was trying to make. The fact that btrfs is synchronous should make it less likely to get corrupt where XFS is actually more likely (which has also been my experience). The amount of resources has nothing to do with it. Again, this really isn’t about btrfs vs XFS at all, but about fixing your power issues. None of these filesystems are designed for power outages to be the every day norm.
I have actually never had problem with BTRFS after sudden losses of power. It literally survives it without any issues. Maybe some data doesn’t have time to sync but certainly not gonna cause corruptions unless you lose power during crucial I/O operations?
Ow sorry, I understood it differently.
I haven’t tried btrfs on fedora server directly without virtual environment but with my 4+ years of experience I haven’t lost any data or OS failure while using XFS and multiple power issues; that’s why I was curious if this is a btrfs problem in virtual environment.
The same for me on my Laptop or any non virtual environment, but on VMware that’s not the case, it doesn’t survive if power issues happens while the host OS (Windows Server) haven’t failed even once.
Thanks, that would be the case for sure since everything works fine in a non-virtual environment, @barryascott can you please suggest to me which Virtual Environment tool is better to run Fedora Server for such cases?