Problem
If the operation of hotplugging a VCPU to a KVM guest running in an LPAR fails, the KVM guest will terminate abruptly.
The bug will be observed in the following scenario:
- The maxvcpus (eg: 128) specified are more than the current vcpus (eg: 4)
- The user attempts to hotplug vcpus as follows:
virsh setvcpus <guest_name> 68
- The following error can be observed
KVM: Create Guest vcpu hcall failed, rc=-44
error: Unable to read from monitor: Connection reset by peer
Cause
During a VCPU hotplug operation for a running qemukvm guest in a ppc64 LPAR, kvm will request the required number of vCPUs from PowerVM Hypervisor. This operation requires resource acquisition by PowerVM which can fail due to a transient/non-transient error. The QEMU instance running in LPAR considers this to be fatal error and proceeds to terminate the running qemukvm guest. This problem affects qemukvm guests across all supported architectures which do not preallocate vCPUs. However, it disproportionately impacts qemukvm guests in PowerVM LPAR since the PowerVM’s KVM resources are shared across multiple LPARs and are limited. Hence a transient vCPU hotplug failure can cause a running KVM guest to terminate.
Related Issues
Bugzilla report: #2304078
Workarounds
A fix is available in upstream QEMU (linked in Bugzilla report) and will be available in Fedora 40 soon. In the meanwhile, one of the following workarounds can be used to avoid the issue:
- Specify the current number of VCPUs to be the same as the max number of VCPUs in the XML config file for the guest. This will force QEMU to allocate all the VCPUs when starting the guests. For eg. if the user needs 128 VCPUs, the tag should be
<vcpu placement='static' current='128'>128</vcpu>
- Avoid hotplugging VCPUs for the running KVM guest.