KVM guests become unstable on 6.18 kernel

Hi, I’m using Fedora Server 43 on host and KVM guests also using Fedora Server 42 or 43.
Since I performed update on the host few days ago, multiple guests encountered kernel panic or freeze on random interval.

I took a look on dmesg or journalctl on the host, but could not find any suspicious message.
Also I ran memtest86 on the host and passed so it seems not RAM issue.
Then I switched back to older kernel, It became stable…

I think staying on 6.17 is a good temporary solution, but I have no idea what to do in the future.
Any solution or recommendation on this?

  • Working stably for now:
    • 6.17.12-300.fc43.x86_64
  • Unstable:
    • 6.18.13-200.fc43.x86_64
    • 6.18.16-200.fc43.x86_64

Host fpaste / virsh dumpxml of guest / Guest’s vmcore-dmesg

I think you are running Fedora in the VM, which version and what kernel is the VM using?

Yes, VMs are using Fedora Server, but running version / kernel is vary on each VM.
Example:

  • Version 43, Kernel 6.18.16-200.fc43.x86_64
  • Version 43, Kernel 6.17.9-300.fc43.x86_64
  • Version 42, Kernel 6.17.13-200.fc42.x86_64

Do you have an example of one of these panics?
Is it always the same panic?

Currently, I observed two panic codes:
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
and
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b.

If you need an example of full dmesg, please refer the vmcore-dmesg-vpngw-20260309.txt at first post I attached.

I was wondering why my VMs don’t break and if comparing what you are doing to my setup would lead to insights.

However, it seems that you have all the info you need to file a bug against the kernel.

Oh one thing, how long did you let memtest run for? Overbight is the usual minimum suggested.

1 Like

I ran memtest on default settings, so it performed 4-pass test.
It took around 13 hours to complete I think, because server have a bit large memory (128GB).
Also it have ECC, so if there’s minor error it will correct I think (No ECC errors observed at that time though).

I use libvirt+KVM/QEMU under F43/F44 on a Lenovo laptop with Intel CPU/GPU and have not noticed any issues so there’s a high chance your problem is hardware specific.

Try bisecting the regression to specific package versions by upgrading/downgrading the kernel and firmwares using the updates-testing/updates-archive repos and then properly report it to the bug tracker.

Okay, I found the kernel 6.19.6-200.fc43 on the updates repo, I’ll try it first.
If it persists, I’ll try updates-testing or updates-archive repos.
Server is operating (It’s hobby use though), so it might take some time for testing.
I’ll let you know if there are any updates. Thanks!

1 Like

Update: Bisecting is done, I submitted a bug report:

I tried kernel 6.19.6 and 6.19.7, same things happened.

During the investigation, I found it happens only when migratable setting is off!
I mean this setting:

<cpu mode='host-passthrough' check='none' migratable='off'>

New workaround is to turn it on.
I need performance and I don’t use migration, so I want to turn it off, but it’s better than not working at all.

1 Like