Multipath with HP3Par

Hi,

We are trying to make multipath with Fedora CoreOS 32 (Linux 5.7.12-200.fc32.x86_64) work properly with an HP3Par storage array.

Sadly for us, the recomended configuration that HPe provides is not working as expected when the failure happens while data is being written the path where we mount the device using multiple paths (cat /dev/zero > /var/lib/docker/dummy.file).

Does anybody has a working configuration for this? Our current (and failing) configuration is:

defaults {
    find_multipaths     no
    user_friendly_names no
}
blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
    device {
        vendor  ".*"
        product ".*"
    }
}
blacklist_exceptions {
    property "(ID_WWN|SCSI_IDENT_.*|ID_SERIAL)"
    device {
        product "Server"
        vendor  "Nimble"
    }
    device {
        product "VV"
        vendor  "3PARdata"
    }
}
devices {
    device {
        path_grouping_policy group_by_prio
        product              "VV"
        fast_io_fail_tmo     10
        dev_loss_tmo         infinity
        failback             immediate
        hardware_handler     "1 alua"
        prio                 alua
        vendor               "3PARdata"
        path_selector        "round-robin 0"
        path_checker         tur
        no_path_retry        30
    }
}

The tests we are doing are like this:

  • multipath -ll
360002ac00000000006000284000243d1 dm-0 3PARdata,VV
size=175G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 2:0:0:0 sde 8:64  active ready running
  |- 2:0:1:0 sdf 8:80  active ready running
  |- 3:0:0:0 sdg 8:96  active ready running
  `- 3:0:1:0 sdh 8:112 active ready running
  • cat /dev/zero > /var/lib/docker/dummy.file
  • While the cat command is still active, we remove paths from the storage array
  • multipath -ll now shows
360002ac00000000006000284000243d1 dm-0 3PARdata,VV
size=175G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 2:0:0:0 sde 8:64  failed faulty running
  |- 2:0:1:0 sdf 8:80  failed faulty running
  |- 3:0:0:0 sdg 8:96  active ready running
  `- 3:0:1:0 sdh 8:112 active ready running
  • At this point, no more writting is possible to the affected device, all operations seem to be in halt/stuck. They don’t fail, they don’t timeout.

  • Reconnect the paths (failback) and once all paths are “active ready running” again, the write operations continue.

Any idea is more than welcome,