Fedora 41 Host libvirt Guests not starting services

Ever since I updated from Fedora 40 to 41 on my host system, my virtual machines (Fedora) are failing to start services. Mostly it is anything to work with the Network and it seems random to what services fail on each reboot (of the Host and/or VMs).

Here is an example of the services that do not start on the web server VM running Fedora 41;
[systemd]
Failed Units: 9
abrtd.service
dovecot.service
firewalld.service
httpd.service
ModemManager.service
named-chroot.service
NetworkManager-wait-online.service
php-fpm.service
sendmail.service

The other VM running Fedora 40 is showing similar behavior. The host system prior to upgrading to 41 was running without any problems, updating
the VMs were straight forward with no errors or anything.

Any ideas where to start looking as to why this is happening?

I haven’t made any configuration changes on the host system other than running dnf to update to 41. No changes to libvirt or the VM machines.

Rebooting the VM results in these services not starting…

[systemd]
Failed Units: 6
httpd.service
named-chroot-setup.service
named-chroot.service
NetworkManager-wait-online.service
php-fpm.service
rpc-statd-notify.service

I am lost on this behavior.

Look for specific errors in the startup log for each failed unit:

systemctl --failed
journalctl --no-pager -b -u unit_name

There is nothing obvious in the log files as to why the services are failing other than ‘timeout’ and FAILED exit code 1. I will restart the Fedora 40 VM (web server) and re-check the log files.

Another notable thing that is happening with the Fedora 40 VM. DNF takes a substantial amount of time to do anything. When I issue any sort of DNF command it takes more than 5 minutes to output anything.

Just also noticed that the web sites are crazy slow to respond if at all as well. I cannot login to the websites admin section.

It seems to all point to something not right with the networking, but I can’t figure out what, or if something has changed in libvirt/KVM and NetworkManager between 40 and 41.

For the Fedora 40 VM, DNF is sending the load average through the stratosphere (up into the 20s) and taking insane long times to complete. Specifically when I first hit enter on the command like ‘dnf update’. It will take in upwards of 8 minutes to display the repos list, last metadata expiration check, dependencies resolved and then the updated packages list to install.
Once I hit yes, it downloads the packages quickly and bogs down on the Transaction check with another load average spike into the 20s.
Then it installs the packages.

This is similar behavior to the Fedora 41 VM on the same host.

It works fine for me, so the problem must be specific to your setup.
Pay attention to the post-upgrade tasks, especially rpmconf and fixfiles:
Upgrading Fedora Linux Using DNF System Plugin :: Fedora Docs

If the issue persists, try to isolate its cause for both host and guests:

  • Check for SELinux denials and switch it to permissive mode.
  • Configure the hostname statically, and verify that it resolves correctly.
  • Check the system resolver and its backends, time required to resolve different hostnames including localhost, own hostname, random FQDNs, test the backends separately, try switching exclusively to well-known public upstream resolvers.
  • Inspect the IP stack, check connectivity to local and remote hosts, routing, timing, packet loss, forwarding parameters, etc.
  • Check the firewall zones, policies, direct rules, inspect the complete nftables dump.

I appreciate your help. It would be my expectation that it does work which is why I am a bit baffled. This has been working quite well since Fedora 31. Both VMs were originally hardware based and then migrated to VM.

I am pretty sure that it’s not a hardware issue on the host. Such as failing memory, drive, etc. I have backed up the qcow2 files for the VMs. I’ve created a new VM using the Fedora 41 KVM qcow2 image using the cyberciti.biz How to create VM using the qcow2 image file in KVM with a few modtification as the article is a bit old. This was a success and I was able to install Apache (httpd) and PHP/PHP-FPM. Apache started. Able to restart/shutdown and start the VM with the expected performance and no bizarre timeout errors. The only difference with this new VM over the original two is the networking. I used the libvirt default non-routable NAT DHCP.

I’m on two different ideas of what has gone wrong. The VMs are old and the configuration that works prior to upgrading the host to 41. Less likely is how I had setup the virtual network in libvirt isn’t correct and causing problems. Not so sure of this scenario as how would the virtual network cause the VM to timeout mounting the qcow2 filesystem. That makes no sense.

I’m going to compare the configuration of the new VM I created to the F40 VM and see if there are any major differences. Make some changes and test.


This is a screen shot from Spice session showing how the /home folder fails to mount.

This may be an entirely separate issue and using the error message you can find a bunch of similar topics on the Internet with promising instructions.

BTW, your initial problem reminds me of this one:
DNF and Firefox take extreemly long to start when VPN active on f40? - #4 by vgaetera

There is a great problem going on.

I backed up the F40 VM qcow2 file to another server.

Deleted the qcow2 file.

Downloaded the Fedora 41 Cloud Edition.

Extended the /dev/vda4 partition to 250gigs. There is btw almost no information that tells you how to do this.

Installed using cloud-init, automatically adding a username for me, installing firewalld, fail2ban, apache, php-fpm, php packages, etc… Automatic update.

I left out assigning the static IP4 address and it was just getting non-routable IP from DHCP.

Copied over about 10gigabytes of data,

Installed Clamscan/Freshclam. I wanted to scan the web data just to make sure.

While running Freshclam the VM kernel spat out this;
[ 5975.875636] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:0]

My suspicions of this is the Freshclam progress bar overflowing the console.

Up to this point things were going great. Ran clamscan on the web data I copied over. Everything fine there, no kernel output.

Make a basic virtual host configuration in Apache. Apache seems happy. It’s obviosly not going to work as it’s not confiogured for the static IP address. But it’s starting without errors.

At this point I give up for the day. I leave the VM running.

This morning, my plan was to configure the static IP.

I shutdown the VM. It still had the cloud-init configuration ISO mounted, so I removed that from the VM configuration. Made sure the VM was configured for the correct network bridge (it was).

Start the VM … AND

Last login: Thu Nov 28 04:13:05 on ttyS0
[systemd]
Failed Units: 2
php-fpm.service
systemd-journal-flush.service

After exhausting hardware and software configurations for the past week, I have determined that there is most definitely something not right with Fedora 41 Kernel 6.11.X and libvirt hosting. Maybe it is specific to the CPUs or hardware am using? I am not sure. The problems I started having are 100% due to the host operating system, which is Fedora 41. Specifically Kernel 6.11.X.

Unfortunately, the solution was to install AlmaLinux 9.5 which uses Kernel 5.14. Once configured for the VMs they immediately started up with no delays, timeouts, core dumps as I was reporting.