I’m attempting to passthrough a Mellanox ConnectX-4 NIC to a VM and gettng memory errors in dmesg:
...
[32278.078269] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.078271] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.078272] ioremap memtype_reserve failed -16
[32278.082272] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.082275] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.082277] ioremap memtype_reserve failed -16
[32278.086270] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.086273] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.086274] ioremap memtype_reserve failed -16
[32278.090268] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.090270] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.090271] ioremap memtype_reserve failed -16
...
These errors repeat hundreds or thousands of times while the VM is starting. Eventually the VM boots properly, lspci
in the guest shows the NIC, but doesn’t load the driver for it so it’s unusable.
I’m able to passthrough other PCI devices like NVMe SSDs and it works fine with no dmesg errors, it’s specifically the Mellanox NICs that have problems. I’ve tried multiple different ConnectX-4 cards, and the passthrough works fine on other machines, but not this one. I’ve tried using SR-IOV and passing through just one virtual function and that also causes the same errors. I’ve also tried the NIC in different PCIe slots and the same thing happens. Each port of the NIC is in it’s own IOMMU group, and I’ve tried passing in each individual port, as well as both ports together, each time getting the same errors
This is with a new MSI X670E ACE motherboard with a 7950X CPU. I’m running Fedora 37 with kernel 6.1.7, but the issue happens with Ubuntu 22.04 w/ 5.15.x.
Is this an incompatibility with the new AM5 platform and the mlx5_core driver, a hardware compatibility issue, a Linux configuration issue, or something else entirely? I wasn’t able to find any suggestions on Google.