Xfs_repair finds new errors everytime it is run, but smartctl long and short test show no issues

Running into a strange issue, I had an usb disk that didn’t have a clean mount so ran xfs_repair against it. Numerous errors were encountered, files were moved to lost+found. I then repeated the xfs_repair and each time it finds more problems and moves more files to lost+file. I then ran smartctl short and long tests and it found no errors at all which seems weird. Does anyone know what might be happening. Seems weird that XFS is having issues and SMARTCTL finds no problems.

Edit: Also installed Seagate Tools and it also is reporting no errors.

1 Like

You may find there is info in dmesg about disk errors.

Thanks for the reply, I did both:
journalctl -k -p warning and journalctl -k -p err

The only message I received from XFS was the “supports timestamps until 2038 (0x7fffffff)”
message.

I’m now wondering if I found a bug in xfs_repair or worse xfs, which wouldn’t be good.

I also just installed the Seagate Linux Tools and they report no errors on the drive. Health OK.

Are you using 6.3.5 or later kernel (see XFS Metadata Corruption Fix in 6.3.5 kernel)?

Thanks for the reply. Yes, I am using the 6.3.5 kernel. Just for a test I ran xfs_repair against a drive I purchased last week and I’m getting the same thing, no errors in smarttools, but this in xfs_repair…
Metadata CRC error detected at 0x5611f2509c60, xfs_dir3_data block 0x1fd0fd00/0x1000
bad hash table for directory inode 533780841 (no data entry): rebuilding
disconnected inode 657285886, moving to lost+found
disconnected inode 657285887, moving to lost+found
disconnected inode 657297728, moving to lost+found
disconnected inode 657297729, moving to lost+found
disconnected inode 657297730, moving to lost+found
disconnected inode 657297731, moving to lost+found
disconnected inode 657297732, moving to lost+found
disconnected inode 657297733, moving to lost+found
disconnected inode 657297734, moving to lost+found
disconnected inode 657297735, moving to lost+found
disconnected inode 657297736, moving to lost+found
disconnected inode 657297737, moving to lost+found
disconnected inode 657297738, moving to lost+found
disconnected inode 657297739, moving to lost+found
disconnected inode 657297740, moving to lost+found
disconnected inode 657297741, moving to lost+found

then I run again and get:
inode identifier 4821522496 mismatch on inode 4821613056
inode identifier 4821522496 mismatch on inode 4821613056
cleared inode 4821613056
corrupt block 5 in directory inode 10956815656

and this is on a disk less than a week old. I suppose it could be defective but since I’m getting this on 4 other drives now seems unlikely to be a co-incidence.

I’ve opened this bug: 2215725 – xfs_repair continues to show different errors on repeated runs - smarttools says drive is OK

Perhaps xfs_repair is a tool of last resort?

Was either drive used with XFS under an earlier kernel? If not exposed to the bug, I would not rule out a hardware issue with the system hosting the drives.

1 Like

The new drive I installed this week was not exposed to the bug, but the other 4 that are experiencing the problem were. I read the bug and my understanding was the bug did not affect my configuration - although who knows. At this point, I’m waiting for a response to the bug report I created to see what the developers say. I agree that a new drive could have hardware issues, but concerning this is happening with 4 other drives, it’s probably unlikely.

Could be helpful to mention the details of the drives in case someone else is seeing the same issue.

Consider the possibility of a hardware issue with the system(s) hosting the drives: firmware in host, power supply, connectors, and cables. I’ve been using XFS since SGI IRIX64, so have seen many XFS issues. We had multiple systems so I was able to pin down hardware problems by swapping suspect parts into another box.

The drives I having issues with are Seagate 8TB USB 3.0 Drives. I moved the drive to another computer and the same thing happened, but good idea. I’ll try with other cables, etc. and see if I can isolate it.

Older USB3 implementations were not robust. How old is the host system? Are you using cables that came with the drives? Some 3rd party cables are not reliable. Smarttools tests are internal to the drive, so don’t test cables. I use XFS on an external WD drive, but connected by USBC (relatively new host box). Do the drives get power over USB or do they have external power adapters?

Yeah, I have an older systemThe drives have external power adapter. I went ahead and ordered a new usb 3.2 card and new cables and hub, so hopefully that will fix it. System is from 2015 using ASMedia USB 3.0 controllers, which are fairly common.

Well, I installed a new usb card, purchased new cables and the problem still occurs. It’s weird, it doesn’t find issues when I plug it into my laptop, it only has problems with my workstation. As I mentioned there are no issues in my logs and rsync to these drives works properly. I updated the bug report with this information. Something appears wrong with xfs_repair.

The other thing I’ll point out is that since xfs_repair is clean when plugged into my laptop, that shows that xfs when using the disk on my workstation is having no issues. xfs_repair is the one that is giving out false positives.

I just wasted $150 on new equipment. :roll_eyes:

Are the versions of software identical on both laptop and desktop systems?
I guess kernel and xfs_repair are the ones that matter.

The installed packages are not the same between the systems, the laptop only has a subset of packages, but xfsprog and kernel are exactly the same:

xfsprogs-6.1.0-3.fc38
kernel-6.3.8-200.fc38

As I mentioned XFS appears to be working fine, it’s xfs_repair which is giving false positives on the workstation.

Do you get the errors in “no modify” mode?

If you want to pursue this issue, you could try setting up a small partition on a USB drive and then save the partition to file on an internal drive and compare what xfs_repair does.

Yes, same results if I run in “no modify” mode.

I’m going to wait and see what response I receive from the bug report before doing any more testing. I’m fairly sure that XFS is working properly, I can plug the exact same disk that I get errors on my system into my laptop and xfs_repair runs clean, so there is some reason I’m getting false positives. I did also check and bugs have been reported against xfs_repair, and I also found one bug that concerned false positives, so this isn’t anything new or unique. Apparently, it happens.

Below shows the same drive first on my workstation, then moments later on my laptop.

Here is the result of the “no modify”:
xfs_repair -n /dev/sdh
Phase 1 - find and verify superblock…
Phase 2 - using internal log
- zero log…
- scan filesystem freespace and inode maps…
- found root inode chunk
Phase 3 - for each AG…
- scan (but don’t clear) agi unlinked lists…
- process known inodes and perform inode discovery…
- agno = 0
- agno = 1
inode identifier 2147862912 mismatch on inode 2147869056
inode identifier 2147862913 mismatch on inode 2147869057
inode identifier 2147862912 mismatch on inode 2147869056
would have cleared inode 2147869056
inode identifier 2147862913 mismatch on inode 2147869057
would have cleared inode 2147869057
- agno = 2
inode identifier 4295556032 mismatch on inode 4295565184
inode identifier 4295556033 mismatch on inode 4295565185
inode identifier 4295556032 mismatch on inode 4295565184
would have cleared inode 4295565184
inode identifier 4295556033 mismatch on inode 4295565185
would have cleared inode 4295565185
inode identifier 4304975936 mismatch on inode 4305117696
inode identifier 4304975937 mismatch on inode 4305117697
inode identifier 4304975936 mismatch on inode 4305117696
would have cleared inode 4305117696
inode identifier 4304975937 mismatch on inode 4305117697
would have cleared inode 4305117697
- agno = 3
inode identifier 6453851908 mismatch on inode 6453931524
inode identifier 6453851909 mismatch on inode 6453931525
inode identifier 6453851908 mismatch on inode 6453931524
would have cleared inode 6453931524
inode identifier 6453851909 mismatch on inode 6453931525
would have cleared inode 6453931525
- agno = 4
- agno = 5
- agno = 6
inode identifier 12885557312 mismatch on inode 12885563840
inode identifier 12885557313 mismatch on inode 12885563841
inode identifier 12885557312 mismatch on inode 12885563840
would have cleared inode 12885563840
inode identifier 12885557313 mismatch on inode 12885563841
would have cleared inode 12885563841
- agno = 7
- process newly discovered inodes…
Phase 4 - check for duplicate blocks…
- setting up duplicate extent list…
- check for inodes claiming duplicate blocks…
- agno = 0
- agno = 4
- agno = 1
- agno = 7
- agno = 6
- agno = 3
- agno = 2
- agno = 5
entry “background-43.jpg” at block 1 offset 96 in directory inode 2147622656 references free inode 2147869056
would clear inode number in entry at offset 96…
entry “04_last_night.opus” at block 0 offset 200 in directory inode 2147864530 references free inode 2147869057
would clear inode number in entry at offset 200…
inode identifier 2147862912 mismatch on inode 2147869056
would have cleared inode 2147869056
inode identifier 2147862913 mismatch on inode 2147869057
would have cleared inode 2147869057
inode identifier 4295556032 mismatch on inode 4295565184
would have cleared inode 4295565184
inode identifier 4295556033 mismatch on inode 4295565185
would have cleared inode 4295565185
inode identifier 12885557312 mismatch on inode 12885563840
would have cleared inode 12885563840
inode identifier 12885557313 mismatch on inode 12885563841
would have cleared inode 12885563841
entry “test 4” at block 24 offset 3368 in directory inode 4304567810 references free inode 4305117696
would clear inode number in entry at offset 3368…
entry “test 3” at block 24 offset 3576 in directory inode 4304567810 references free inode 4305117697
would clear inode number in entry at offset 3576…
inode identifier 4304975936 mismatch on inode 4305117696
would have cleared inode 4305117696
inode identifier 4304975937 mismatch on inode 4305117697
would have cleared inode 4305117697
inode identifier 6453851908 mismatch on inode 6453931524
would have cleared inode 6453931524
inode identifier 6453851909 mismatch on inode 6453931525
would have cleared inode 6453931525
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity…
- traversing filesystem …
Metadata CRC error detected at 0x55f6d4375b70, xfs_dir3_block block 0x347e8/0x1000
expected owner inode 152592, got 146784, directory block 215016
would rebuild directory inode 152592
would create missing “.” entry in dir ino 152592
entry “background-43.jpg” in directory inode 2147622656 points to free inode 2147869056, would junk entry
bad hash table for directory inode 2147622656 (no data entry): would rebuild
would rebuild directory inode 2147622656
entry “04_last_night.opus” in directory inode 2147864530 points to free inode 2147869057, would junk entry
bad hash table for directory inode 2147864530 (no data entry): would rebuild
would rebuild directory inode 2147864530
entry “test” in directory inode 4304567810 points to free inode 4305117696, would junk entry
entry “test2” in directory inode 4304567810 points to free inode 4305117697, would junk entry
would rebuild directory inode 4304567810
Metadata CRC error detected at 0x55f6d4377460, xfs_dir3_leaf1 block 0x284019d48/0x1000
leaf block 8388608 for directory inode 10737498380 bad CRC
would rebuild directory inode 10737498380
- traversal finished …
- moving disconnected inodes to lost+found …
disconnected inode 751929, would move to lost+found
disconnected inode 751930, would move to lost+found
disconnected inode 751931, would move to lost+found
disconnected inode 751932, would move to lost+found
disconnected inode 751933, would move to lost+found
disconnected inode 751934, would move to lost+found
disconnected inode 751935, would move to lost+found
disconnected inode 753609, would move to lost+found
disconnected inode 753610, would move to lost+found
disconnected inode 753620, would move to lost+found
disconnected inode 753625, would move to lost+found
disconnected inode 753643, would move to lost+found
disconnected inode 753648, would move to lost+found
disconnected inode 754051, would move to lost+found
disconnected inode 754053, would move to lost+found
disconnected inode 754057, would move to lost+found
disconnected inode 754069, would move to lost+found
disconnected inode 754074, would move to lost+found
disconnected inode 754080, would move to lost+found
disconnected inode 754084, would move to lost+found
disconnected inode 754088, would move to lost+found
disconnected inode 754096, would move to lost+found
disconnected inode 754099, would move to lost+found
disconnected inode 754101, would move to lost+found
disconnected inode 754111, would move to lost+found
disconnected inode 755780, would move to lost+found
disconnected inode 755784, would move to lost+found
disconnected inode 755788, would move to lost+found
disconnected inode 755795, would move to lost+found
disconnected inode 755802, would move to lost+found
disconnected inode 755807, would move to lost+found
disconnected inode 755809, would move to lost+found
disconnected inode 755810, would move to lost+found
disconnected inode 755813, would move to lost+found
disconnected inode 755819, would move to lost+found
disconnected inode 755829, would move to lost+found
disconnected inode 755836, would move to lost+found
disconnected inode 755837, would move to lost+found
disconnected inode 757826, would move to lost+found
disconnected inode 757827, would move to lost+found
disconnected inode 757836, would move to lost+found
disconnected inode 757840, would move to lost+found
disconnected inode 757842, would move to lost+found
disconnected inode 757846, would move to lost+found
disconnected inode 757851, would move to lost+found
disconnected inode 757862, would move to lost+found
disconnected inode 757868, would move to lost+found
disconnected inode 757877, would move to lost+found
disconnected inode 757880, would move to lost+found
disconnected inode 757883, would move to lost+found
disconnected inode 757887, would move to lost+found
disconnected inode 759107, would move to lost+found
disconnected inode 759111, would move to lost+found
disconnected inode 759118, would move to lost+found
disconnected inode 759120, would move to lost+found
disconnected inode 759129, would move to lost+found
disconnected inode 759130, would move to lost+found
disconnected inode 759134, would move to lost+found
disconnected inode 759140, would move to lost+found
disconnected inode 759147, would move to lost+found
disconnected inode 759151, would move to lost+found
disconnected inode 759155, would move to lost+found
disconnected inode 759162, would move to lost+found
disconnected inode 759164, would move to lost+found
disconnected inode 760000, would move to lost+found
disconnected inode 760004, would move to lost+found
disconnected inode 760007, would move to lost+found
disconnected inode 760022, would move to lost+found
disconnected inode 760031, would move to lost+found
disconnected inode 760033, would move to lost+found
disconnected inode 760034, would move to lost+found
disconnected inode 760056, would move to lost+found
disconnected inode 4305583518, would move to lost+found
disconnected inode 4305583519, would move to lost+found
disconnected inode 4305583520, would move to lost+found
disconnected inode 4305583521, would move to lost+found
disconnected inode 4305583522, would move to lost+found
disconnected inode 4305583523, would move to lost+found
Phase 7 - verify link counts…
would have reset inode 4304567810 nlinks from 6708 to 6706
No modify flag set, skipping filesystem flush and exiting.

And here is the same drive a few moments later with xfs_repair on my laptop:

xfs_repair /dev/sdb
Phase 1 - find and verify superblock…
Phase 2 - using internal log
- zero log…
- scan filesystem freespace and inode maps…
- found root inode chunk
Phase 3 - for each AG…
- scan and clear agi unlinked lists…
- process known inodes and perform inode discovery…
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes…
Phase 4 - check for duplicate blocks…
- setting up duplicate extent list…
- check for inodes claiming duplicate blocks…
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
Phase 5 - rebuild AG headers and trees…
- reset superblock…
Phase 6 - check inode connectivity…
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem …
- traversal finished …
- moving disconnected inodes to lost+found …
Phase 7 - verify and correct link counts…
done

The problem seems likely related to your host hardware, but:

Was “another computer” similar to the one that first exhibited the problem or was it the laptop that is now not reporting problems?

Sorry, I should have been using consistent terminology. My bad. The “another computer” is the laptop.

Yeah, I can’t think of much more to do until I hear from the bug report. It’s obvious that xfs_repair isn’t working correctly. I agree, there is something in host hardware that is triggering an xfs_repair bug. The problem now is that I’ve installed a new USB 3.2 Gen 2 card, and purchased new cables and still have the issue. So, it’s a bit puzzling why xfs_repair is having a problem. Luckily, XFS seems to be ok and these are false positives. The worrisome issue of course is it isn’t safe to run xfs_repair now on some systems. You’ll lose data because xfs_repair will unnecessarily move files to lost+found. In my case, these drives are only for backup purposes so I just need to run rsync again to fix it, but for other folks, this could be a huge issue.