Fstrim takes forever (weeks) on raid6 md array

Thanks for looking in advance.

Fedora server 38.

I have constructed a raid6 array of ssd:

/dev/md127:
           Version : 1.2
     Creation Time : Sun Nov 19 07:54:31 2023
        Raid Level : raid6
        Array Size : 19534423040 (18.19 TiB 20.00 TB)
     Used Dev Size : 3906884608 (3.64 TiB 4.00 TB)
      Raid Devices : 7
     Total Devices : 7
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sun Nov 19 15:36:27 2023
             State : active
    Active Devices : 7
   Working Devices : 7
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : zone9:store2-md  (local to host zone9)
              UUID : ccfd514d:7e68060e:530338f7:a0e0a9e3
            Events : 3593

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       8       81        3      active sync   /dev/sdf1
       4       8       97        4      active sync   /dev/sdg1
       5       8      113        5      active sync   /dev/sdh1
       6       8      129        6      active sync   /dev/sdi1

Every device supports trim/DZAT (drives c-i):

# lsblk -Dd
NAME    DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda            0        0B       0B         0
sdb            0        0B       0B         0
sdc            0      512B     128M         0
sdd            0      512B     128M         0
sde            0      512B     128M         0
sdf            0      512B     128M         0
sdg            0      512B     128M         0
sdh            0      512B     128M         0
sdi            0      512B     128M         0
sdj            0        0B       0B         0
sdk            0        0B       0B         0
sdl            0        0B       0B         0
sdm            0        0B       0B         0
sdn            0      512B     128M         0
sdo            0        0B       0B         0
zram0          0        4K       2T         0
nvme0n1        0      512B       2T         0


# hdparm -I /dev/sd[cdefghi] | grep -i trim
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM

Mod param devices_handle_discard_safely is enabled:

# cat /sys/module/raid456/parameters/devices_handle_discard_safely
Y

Logs show problems with drives during fstrim operation

[ 6391.089300] sd 6:0:1:0: [sdd] tag#4386 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[ 6391.089302] scsi target6:0:1: handle(0x001a), sas_address(0x300062b203e56bc2), phy(2)
[ 6391.089305] scsi target6:0:1: enclosure logical id(0x500062b203e56bc0), slot(0)
[ 6391.089308] scsi target6:0:1: enclosure level(0x0000), connector name(     )
[ 6391.089310] sd 6:0:1:0: No reference found at driver, assuming scmd(0x0000000074119868) might have completed
[ 6391.089312] sd 6:0:1:0: task abort: SUCCESS scmd(0x0000000074119868)
[ 6391.596882] sd 6:0:6:0: Power-on or device reset occurred
[ 6391.597892] sd 6:0:1:0: Power-on or device reset occurred
[ 6426.358689] sd 6:0:3:0: attempting task abort!scmd(0x00000000f67e2fd4), outstanding for 30110 ms & timeout 30000 ms
[ 6426.358694] sd 6:0:3:0: [sdf] tag#5166 CDB: Write(16) 8a 08 00 00 00 00 00 00 08 10 00 00 00 08 00 00
[ 6426.358695] scsi target6:0:3: handle(0x001c), sas_address(0x300062b203e56bc4), phy(4)
[ 6426.358697] scsi target6:0:3: enclosure logical id(0x500062b203e56bc0), slot(7)
[ 6426.358698] scsi target6:0:3: enclosure level(0x0000), connector name(     )
[ 6426.388344] sd 6:0:3:0: task abort: SUCCESS scmd(0x00000000f67e2fd4)
[ 6426.388386] sd 6:0:3:0: attempting task abort!scmd(0x0000000083d2ef3a), outstanding for 30140 ms & timeout 30000 ms
[ 6426.388394] sd 6:0:3:0: [sdf] tag#5161 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[ 6426.388398] scsi target6:0:3: handle(0x001c), sas_address(0x300062b203e56bc4), phy(4)
[ 6426.388405] scsi target6:0:3: enclosure logical id(0x500062b203e56bc0), slot(7)
[ 6426.388410] scsi target6:0:3: enclosure level(0x0000), connector name(     )
[ 6426.388415] sd 6:0:3:0: No reference found at driver, assuming scmd(0x0000000083d2ef3a) might have completed
[ 6426.388419] sd 6:0:3:0: task abort: SUCCESS scmd(0x0000000083d2ef3a)
[ 6427.097362] sd 6:0:3:0: Power-on or device reset occurred
[ 6462.709675] sd 6:0:3:0: attempting task abort!scmd(0x000000007420dd31), outstanding for 30408 ms & timeout 30000 ms
[ 6462.709688] sd 6:0:3:0: [sdf] tag#597 CDB: Write(16) 8a 08 00 00 00 00 00 00 08 10 00 00 00 08 00 00
[ 6462.709692] scsi target6:0:3: handle(0x001c), sas_address(0x300062b203e56bc4), phy(4)
[ 6462.709699] scsi target6:0:3: enclosure logical id(0x500062b203e56bc0), slot(7)
[ 6462.709703] scsi target6:0:3: enclosure level(0x0000), connector name(     )
[ 6462.739637] sd 6:0:3:0: task abort: SUCCESS scmd(0x000000007420dd31)
[ 6462.739678] sd 6:0:3:0: attempting task abort!scmd(0x00000000d7fa046e), outstanding for 30438 ms & timeout 30000 ms
[ 6462.739686] sd 6:0:3:0: [sdf] tag#592 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[ 6462.739690] scsi target6:0:3: handle(0x001c), sas_address(0x300062b203e56bc4), phy(4)
[ 6462.739698] scsi target6:0:3: enclosure logical id(0x500062b203e56bc0), slot(7)
[ 6462.739702] scsi target6:0:3: enclosure level(0x0000), connector name(     )
[ 6462.739707] sd 6:0:3:0: No reference found at driver, assuming scmd(0x00000000d7fa046e) might have completed
[ 6462.739711] sd 6:0:3:0: task abort: SUCCESS scmd(0x00000000d7fa046e)
[ 6463.347964] sd 6:0:3:0: Power-on or device reset occurred

I’m out of ideas as how to fix this, though am wondering why DISC-GRAN is larger for the md device than the underlying hardware:

NAME                                DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdc                                        0      512B     128M         0
└─sdc1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0
sdd                                        0      512B     128M         0
└─sdd1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0
sde                                        0      512B     128M         0
└─sde1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0
sdf                                        0      512B     128M         0
└─sdf1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0
sdg                                        0      512B     128M         0
└─sdg1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0
sdh                                        0      512B     128M         0
└─sdh1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0
sdi                                        0      512B     128M         0
└─sdi1                                     0      512B     128M         0
  └─md127                                  0        4M     128M         0
    └─store2--vg-store2--lv                0        4M     128M         0

How do I fix/what are my options?

Hello @qc4cyhf ,
I’m not going to be much help here but I was wanting to ask you. Doesn’t the raid6 support in btrfs come with a caveat that it is not working correctly? ie. still being worked on.