I am a bit puzzled by the error message I get from xfs_scrub. The kernel 7.0 changelog stated that xfs can now repair on-the-fly while the fs is online (mounted).
P.S.: It seems that online repair is not activated in the kernel. However, this begs the question, how is online repair possible? One of the reasons I waited for 7.0 was self-healing xfs, which does not seem to be availabe though.
[root@f44-server 0 ~]# grep XFS_ONLINE_ /boot/config-$(uname -r)
CONFIG_XFS_ONLINE_SCRUB=y
# CONFIG_XFS_ONLINE_SCRUB_STATS is not set
# CONFIG_XFS_ONLINE_REPAIR is not set
Did you retry the scrub using the option shown in that message?
I do not use xfs, but that message seems clear to me.
man xfs_scrub shows that the -n option is to check the metadata without performing changes.
This is also from the man page
DESCRIPTION
xfs_scrub attempts to check and repair all metadata in a mounted XFS filesystem.
WARNING! This program is EXPERIMENTAL, which means that its behavior and interface could change at any time!
xfs_scrub asks the kernel to scrub all metadata objects in the filesystem. Metadata records are scanned for obviously bad values and then cross-referenced against
other metadata. The goal is to establish a reasonable confidence about the consistency of the overall filesystem by examining the consistency of individual metadata
records against the other metadata in the filesystem. Damaged metadata can be rebuilt from other metadata if there exists redundant data structures which are intact.
Filesystem corruption and optimization opportunities will be logged to the standard error stream. Enabling verbose mode will increase the amount of status informa‐
tion sent to the output.
If the kernel scrub reports that metadata needs repairs or optimizations and the user does not pass -n on the command line, this program will ask the kernel to make
the repairs and to perform the optimizations. See the sections about optimizations and repairs for a list of optimizations and repairs known to this program. The
kernel may not support repairing or optimizing the filesystem. If this is the case, the filesystem must be unmounted and xfs_repair(8) run on the filesystem to fix
the problems.
Note this is experimental and depends upon the kernel support to perform online repairs.
Using the -n option seems like it should report if there are needed repairs; then performing the repairs could be another step (either online or with the file system unmounted).
I did not miss that, and I clearly said I was not using xfs and showed what the docs tell us.
If you have documented reports that kernel 7.0 should support online repairs of the xfs file system then it seems you should reference that documentation and file a bug against the kernel that it fails to provide the documented features.
Fair enough. I’ll go through the kernel changelog, which can take a while.
But I was also referring to all the “What’s new in kernel 7.0” articles, which all stated that taking an xfs fs offline for repairs is no longer necessary. It’s just annoying (and hey, I am not saying that this is your fault or anybody else’s on the Fedora team) that this is not the case and I just wanted a clarification from people who know more about the internals of the Fedora kernel than me.
I believe the new XFS feature in 7.0 is autonomous self-healing: https://www.phoronix.com/news/XFS-Linux-7.0
This is separate from online repair, which was implemented earlier. Fedora would need to build the kernel with CONFIG_XFS_ONLINE_REPAIR for it to be enabled, per my understanding.
My understanding after reading the docs linked by @yurislnx just above is that the fedora kernel is not compiled with the attribute noted and thus cannot do online repairs on-demand but does so automatically. Thus using xfs-scrub -n should tell the user if there are errors instead of trying to force a repair. The repairs should be done automatically if needed so none would be expected to appear with the xfs-scrub command.
The articles are a bit confusing. Some state that the scrub timer service is required, others state that there is a monitoring health service (different from the scrub service). Yet others talk about it as if all is done wihout additional requirements - self healing out of the box so to speak.
It looks to me that there is no consensus on that matter.
It would be nice if there was a proper explanation of all these areas with reference to what is actually new in 7.0. But apparently this does not exist.
This article describes how online repair can be used: https://blogs.oracle.com/linux/xfs-online-filesystem-repair-in-uek8
It shows everything that’s required to make it work, including the scrub timer and service. Written by the guy who built it.
The new feature in 7.0 seems to be related to reporting.
I am also an XFS user across all of my Fedora boxen and would like these new kernel 7.0 XFS features enabled. Or to at least understand enough to make the call on what I should setup…
AFAIK, If you build your own kernels they will not be signed and cannot be used with secure boot. You also will need to manually upgrade kernels to keep pace with development.
If you want it active you should work toward encouraging fedora and the kernel developers to enable it on the provided kernels.
May be a good idea to reach out to the xfsprogs maintainers and ask their thoughts.
My feeling is that it may be too soon, but at the same time, UEK 8.2 formally announced its availability, as opposed to calling it a “tech preview” back in 8.0, so maybe they consider it stable enough. But I’m not an XFS expert so take it with a grain of salt.
p.s. @tessus I just realized that the confusion with the monitoring service may have been because xfs_healer which was released a few days ago: xfsprogs: v7.0.0 released [LWN.net]
To be clear, xfs_healer and xfs_scrub are complementary tools:
Scrub walks the whole filesystem, finds stuff that needs fixing or
rebuilding, and rebuilds it. This is sort of analogous to a patrol
scrub.
Healer listens for metadata corruption messages from the kernel and
issues a targeted repair of that structure. This is kind of like an
ondemand scrub.
…
I am pretty sure I have built and signed my own kernels because I have been using secure boot for quite a long time with Fedora. But I don’t want to be flippant or dismissive… Thanks for the tip.
Right, but this also requires the kernel to be compiled with CONFIG_XFS_ONLINE_REPAIR=y and I think CONFIG_XFS_ONLINE_SCRUB_STATS=y should be set as well.
So the healer daemon is more what I was looking for (opposed to the scrub, which is also not a bad idea), but I most likely would use both in combination.
I honestly do not understand why the kernel was not compiled with the 2 options I mentioned. Nobody is forced to use the features, but at least they’d be available when needed.
Question to @jforbes : any particular reason why these 2 options were not activated in the kernel? If not, can you add those for future kernel builds?
kernel 7.0.9 has both options enabled. For everyone eager to start using online repair, I’d suggest to wait for xfsprogs 7.0.0, which eleminates @computersavvy’s mention of experimental support for online repair.
However, I wanted to also mention that I tried the online optimize and repair with xfsprogs 6.18.0 and it seems to work.