I/O errors with dd & rsync but not with Nautilus or Dejà Dup

Hey,

I just bought a Seagate Expansion Portable 2TB. When copying files to it With Nautilus or Dejà Dup it works just fine (tried copying a folder with ≥20GB of misc. files and compared checksums – no problem) with the speed of about 30MB/s, running on a Thinkpad X220 USB2 port.

The problem is when I use rsync or dd, the speed drops drastically after a little while, almost freezing, and the dmesg output is filled with various UAS and I/O errors.

Example output:

nov 23 16:06:21 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#27 uas_eh_abort_handler 0 uas-tag 16 inflight: CMD 
nov 23 16:06:21 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#27 CDB: Write(10) 2a 00 10 07 f0 40 00 02 00 00

If it’s just these, it usually recovers, usually within several (max. tens of) minutes. If I’m trying to write/copy too much, things like this start happening:

nov 23 16:02:27 jurajov-thinkpad kernel: Buffer I/O error on dev sdc1, logical block 33635348, lost async page write
nov 23 16:02:27 jurajov-thinkpad kernel: Buffer I/O error on dev sdc1, logical block 33635349, lost async page write
nov 23 16:02:27 jurajov-thinkpad kernel: Buffer I/O error on dev sdc1, logical block 33635350, lost async page write
nov 23 16:02:27 jurajov-thinkpad kernel: blk_update_request: I/O error, dev sdc, sector 269083304 op 0x1:(WRITE) flags 0x4800 phys_seg 64 prio class 0
nov 23 16:02:27 jurajov-thinkpad kernel: blk_update_request: I/O error, dev sdc, sector 269083816 op 0x1:(WRITE) flags 0x800 phys_seg 22 prio class 0
nov 23 16:02:27 jurajov-thinkpad kernel: blk_update_request: I/O error, dev sdc, sector 269084440 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0

Unmounting the harddrive at this point is useless – I tried waiting several hours, it didn’t finish.

My research suggests falling back to the usb-storage driver. But that just seems like a hack. AFAIK Linux already special-cases all Seagate enclosures; shouldn’t this be reported as a kernel bug? And what is the problem even, if Nautilus and the like operate just fine?

The weirdest thing is that this states that the already-present quirk is not needed on this model, stating a kernel developer (which is true – disabling it fixes S.M.A.R.T. access. It even goes as far as to claim this drive does not have any firmware bugs.

I’m thinking about selling the drive and buying another one.

1 Like

In no particular order:

  1. Disable UAS. The gist is:
    • Create /etc/modprobe.d/blacklist_uas.conf containing text: options usb-storage quirks=xxxx:yyyy:u
    • You need to customize the xxxx:yyyy with the address for your USB drive found by lsusb The u is literal, so keep that.
    • dracut -f (I’m not certain it’s strictly necessary unless the drive is used for sysroot)
  2. (Using UAS) see if the problem happens with kernel 5.10-rc5 or newer. You can install fc34 kernels on Fedora 32/33.
    https://koji.fedoraproject.org/koji/packageinfo?packageID=8
    • If it does reproduce, then report a bug on linux-usb@vger.kernel.org including make/model of the USB (lspci) and listing of USB devices (lsusb) and kernel version. And a complete section for dmesg (or the entire dmesg) that includes the problem, i.e. I tend toward doing a clean boot without the device, so I can include a dmesg snippet that includes attaching the device and reproducing the problem, so developers can see everything related to that device.
  3. See if the problem still occurs when connected to an externally powered USB hub.

All of my USB problems have gone away using a (Dyconn) USB hub. But also, USB is just super finicky and then the bridge chipsets are chock full of various bugs or quirks.

2 Likes

Thanks for the reply!

I tried installing kernel-5.10.0-0.rc5.20201125git127c501a03d5.85.fc34 + kernel-core, kernel-modules and kernel-modules-extra, but, unfortunately, I can’t get it to boot, because it doesn’t ask me for the LUKS password. I’ll try booting rawhide properly.

usb-storage works great, but I do hope this is fixable upstream rather.

I’m currently on Rawhide – the problem completely disappeared. Thanks so much; I almost sold the drive. It even works after disabling the built-in quirk, just as the thread on smartmontools suggested.

Do you know what caused the bug, or when 5.10 will land in F33?

I can’t get it to boot, because it doesn’t ask me for the LUKS password

That’s unexpected. The password request is a function of plymouth+systemd-cryptsetup all baked inside of the initramfs. I guess it could be a momentary kernel bug as that particular build is in an “in-between” state, between rc5 and rc6. It could also be a dracut bug that only gets triggered with 5.10 series. I guess for now I’d ignore this problem because rc6 will probably get released today and then Fedora will have a build of it in koji sometime tomorrow.

Do you know what caused the bug, or when 5.10 will land in F33?

There are too many changes between 33 and rawhide to know for sure it’s been fixed in the kernel, but I suspect it’s fixed in the kernel. And I’ve got no idea what fixed it, there’s probably hundreds of USB and UAS related patches between 5.9 and 5.10. It might be faster to such search the last month or three of the linux-usb mailing list for Seagate and it’d probably pop up. There’s so much active development in USB that they don’t even use the kernel bugzilla, they want everything reported on linux-usb@ list.

According to kernel crystal ball Dec 20 or Dec 27 is when it wraps up upstream development. And about 5 weeks to rebase in Fedora 33, give or take. That’s the time frame for it being in updates repo for everybody. You can opt to run it sooner than this.

I upgraded to 5.9.11, and I can’t reproduce this anymore!

I checked the changelog.

Can Guo (2):
      scsi: ufs: Fix unbalanced scsi_block_reqs_cnt caused by ufshcd_hold()
      scsi: ufs: Try to save power mode change and UIC cmd completion timeout

are the only things I found that could be related to this. I’ll post back if I manage to reproduce it again.

Regarding the boot bug, I think this thread elaborates on it.

Called too soon.

dec 03 13:51:59 jurajov-thinkpad kernel: usb 2-1.2: stat urb: no pending cmd for uas-tag 17
dec 03 13:52:00 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD 
dec 03 13:52:00 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#4 CDB: Write(10) 2a 00 3a 0e 76 c0 00 00 c8 00
dec 03 13:52:03 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 7 inflight: CMD 
dec 03 13:52:03 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#5 CDB: Write(10) 2a 00 3a 0e ea c0 00 02 00 00
dec 03 13:52:06 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#7 uas_eh_abort_handler 0 uas-tag 8 inflight: CMD 
dec 03 13:52:06 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#7 CDB: Write(10) 2a 00 3a 0e ec c0 00 02 00 00
dec 03 13:52:09 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#10 uas_eh_abort_handler 0 uas-tag 9 inflight: CMD 
dec 03 13:52:09 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#10 CDB: Write(10) 2a 00 3a 0e ee c0 00 00 60 00
dec 03 13:52:12 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#8 uas_eh_abort_handler 0 uas-tag 10 inflight: CMD 
dec 03 13:52:12 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#8 CDB: Write(10) 2a 00 3a 0f 6a c0 00 02 00 00
dec 03 13:52:12 jurajov-thinkpad kernel: scsi host6: uas_eh_device_reset_handler start
dec 03 13:52:12 jurajov-thinkpad kernel: usb 2-1.2: reset high-speed USB device number 6 using ehci-pci
dec 03 13:52:12 jurajov-thinkpad kernel: scsi host6: uas_eh_device_reset_handler success
dec 03 13:52:43 jurajov-thinkpad kernel: sd 6:0:0:0: [sdc] tag#8 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD

BTW, I tried again with a rc6 kernel, reproduced it after about 15 minutes. It was suspiciously connected with the moment I decided to launch Firefox, not sure if that’s connected, but I’ll try shooting an email to the mailing list.