Hi,
We are trying to make multipath with Fedora CoreOS 32 (Linux 5.7.12-200.fc32.x86_64) work properly with an HP3Par storage array.
Sadly for us, the recomended configuration that HPe provides is not working as expected when the failure happens while data is being written the path where we mount the device using multiple paths (cat /dev/zero > /var/lib/docker/dummy.file
).
Does anybody has a working configuration for this? Our current (and failing) configuration is:
defaults {
find_multipaths no
user_friendly_names no
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
device {
vendor ".*"
product ".*"
}
}
blacklist_exceptions {
property "(ID_WWN|SCSI_IDENT_.*|ID_SERIAL)"
device {
product "Server"
vendor "Nimble"
}
device {
product "VV"
vendor "3PARdata"
}
}
devices {
device {
path_grouping_policy group_by_prio
product "VV"
fast_io_fail_tmo 10
dev_loss_tmo infinity
failback immediate
hardware_handler "1 alua"
prio alua
vendor "3PARdata"
path_selector "round-robin 0"
path_checker tur
no_path_retry 30
}
}
The tests we are doing are like this:
multipath -ll
360002ac00000000006000284000243d1 dm-0 3PARdata,VV
size=175G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 2:0:0:0 sde 8:64 active ready running
|- 2:0:1:0 sdf 8:80 active ready running
|- 3:0:0:0 sdg 8:96 active ready running
`- 3:0:1:0 sdh 8:112 active ready running
cat /dev/zero > /var/lib/docker/dummy.file
- While the
cat
command is still active, we remove paths from the storage array -
multipath -ll
now shows
360002ac00000000006000284000243d1 dm-0 3PARdata,VV
size=175G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 2:0:0:0 sde 8:64 failed faulty running
|- 2:0:1:0 sdf 8:80 failed faulty running
|- 3:0:0:0 sdg 8:96 active ready running
`- 3:0:1:0 sdh 8:112 active ready running
-
At this point, no more writting is possible to the affected device, all operations seem to be in halt/stuck. They don’t fail, they don’t timeout.
-
Reconnect the paths (failback) and once all paths are “active ready running” again, the write operations continue.
Any idea is more than welcome,