It’s not the first time but for once I thought I would upload this to see if someone has any ideas why it would occur.
Last updates of today. It s a simple fedora server with libvirt and qemu installed under fedora 34
Of course as each times if I force power off and reboot then everything is okey. But still, it would be nice to know why it occurs.
That’s odd. So it doesn’t happen all the time. Any ideas on how you can reproduce it?
There’s this bug related to the bfq I/O scheduler that causes system freezes, but none of the bugs talk about a kernel crash on boot, so it may not be the same issue.
Once you’ve managed to boot, can you please see if you are able to get a crash log that we can use to file a bug?
Well it’s not the first time and it happens I guess on specific packages update.
And it happens every time on a sudo reboot.
So I don’t know if it happens at boot sequence or at the shutdown.
If you tell me what you are looking for specifically that would help me.
It doesn’t happen during the vm are running for example. And it’s not a memory problem, nor a cpu problem nor a cache problem. Everything have been tested. The only time it is happening and I can’t stress it enough, it’s after a package update and most of the time at a reboot, only one time it freezed the system completely.
Do you want the list of packages maybe which have been released to the repos today?
And every day the server is put up to date
Return-Code : Success
Releasever : 34
Command Line : update --refresh
Upgrade jitterentropy-3.3.0-1.fc34.x86_64 @updates
Upgraded jitterentropy-3.0.2-2.git.409828cf.fc34.x86_64 @@System
Upgrade libicu-67.1-7.fc34.x86_64 @updates
Upgraded libicu-67.1-6.fc34.x86_64 @@System
Upgrade libssh-0.9.6-1.fc34.x86_64 @updates
Upgraded libssh-0.9.5-2.fc34.x86_64 @@System
Upgrade libssh-config-0.9.6-1.fc34.noarch @updates
Upgraded libssh-config-0.9.5-2.fc34.noarch @@System
Upgrade perl-libwww-perl-6.57-1.fc34.noarch @updates
Upgraded perl-libwww-perl-6.56-1.fc34.noarch @@System
Upgrade php-7.4.24-1.fc34.x86_64 @updates
Upgraded php-7.4.23-1.fc34.x86_64 @@System
Upgrade php-cli-7.4.24-1.fc34.x86_64 @updates
Upgraded php-cli-7.4.23-1.fc34.x86_64 @@System
Upgrade php-common-7.4.24-1.fc34.x86_64 @updates
Upgraded php-common-7.4.23-1.fc34.x86_64 @@System
Upgrade php-fpm-7.4.24-1.fc34.x86_64 @updates
Upgraded php-fpm-7.4.23-1.fc34.x86_64 @@System
Upgrade php-json-7.4.24-1.fc34.x86_64 @updates
Upgraded php-json-7.4.23-1.fc34.x86_64 @@System
Upgrade php-mbstring-7.4.24-1.fc34.x86_64 @updates
Upgraded php-mbstring-7.4.23-1.fc34.x86_64 @@System
Upgrade php-opcache-7.4.24-1.fc34.x86_64 @updates
Upgraded php-opcache-7.4.23-1.fc34.x86_64 @@System
Upgrade php-pdo-7.4.24-1.fc34.x86_64 @updates
Upgraded php-pdo-7.4.23-1.fc34.x86_64 @@System
Upgrade php-sodium-7.4.24-1.fc34.x86_64 @updates
Upgraded php-sodium-7.4.23-1.fc34.x86_64 @@System
Upgrade php-xml-7.4.24-1.fc34.x86_64 @updates
Upgraded php-xml-7.4.23-1.fc34.x86_64 @@System
Upgrade pinentry-1.2.0-1.fc34.x86_64 @updates
Upgraded pinentry-1.1.1-3.fc34.x86_64 @@System
Upgrade python-systemd-doc-234-19.fc34.x86_64 @updates
Upgraded python-systemd-doc-234-16.fc34.x86_64 @@System
Upgrade python2.7-2.7.18-15.fc34.x86_64 @updates
Upgraded python2.7-2.7.18-11.fc34.x86_64 @@System
Upgrade python3-systemd-234-19.fc34.x86_64 @updates
Upgraded python3-systemd-234-16.fc34.x86_64 @@System
Upgrade rng-tools-6.14-1.git.56626083.fc34.x86_64 @updates
Upgraded rng-tools-6.13-2.git.d207e0b6.fc34.x86_64 @@System
Upgrade squashfs-tools-4.5-3.20210913gite048580.fc34.x86_64 @updates
Upgraded squashfs-tools-4.5-2.fc34.x86_64 @@System
this is the dnf transaction preceeding the
sudo reboot command
the links you provided are all speaking about kernel panic.
Since nearly every day(not on sunday), there are updates releases that affects libvirtd or virtualization, every day the server is completely rebooted.
Since the first kernel panic or hang (I don’t know how you want to call it) a few months ago, and my destroyed vm of opnsense because of it, every dnf transaction are done when the VMs are down and nothing else besides the host system is running.
Those VMs are using the total amount of 32G RAM and CPU is mostly at 50% all the time, and I have never had any problem while the VM were running and that I wan’t doing a dnf transaction. If it was a kernel panic, at least every one in a while, because of the random load the VMs would have caused a kernel panic by now(as it is specified in the links you provided where the people are speaking about hanging while working) which is not the case.
Do you still want me to configure the kernel dump in the grub for future freezes?
If yes, then we will need to wait for a future dnf transaction which would cause the crash.
OK, yeh, not sure.
Your image of the crash does list
bfq, which is one of the schedulers. So maybe try changing that as listed in the Kernel bug I noted and see if that makes the issue go away:
echo mq-deadline | sudo tee /sys/block/sd*/queue/scheduler
but change the
sd* bit to match whatever the identifier for your disks are.
Setting up the Kernel crash dump would be good. That’s probably the only way to get more information on this to then see if bugs exist etc.
okey I will set that up and so on the next crash caused after a dnf transaction I will try to remind me to report back it here.
Those ssd disk had already the mq option activated and are in a raid1 configuration. Well several of their partitions are
Where did you see exactly the change in a bfq parameters?
Because They were talking about a patch a few older iteration back of the kernel but that does not concern me then since their patch has been already applied.I didn’t see anything else.
Also you should know that swap is nearly not used in my case. Everything seems to go to ram and not swap
At the moment, we don’t have enough information to say if you are seeing the same issue or not, but given that bfq is involved, it’s worth a try
There has been another kernel crash. Not after a dns transaction for once.
But the modification I ve done under your guidance didn’t pull through, there is nothing in /var/crash
Do you want me to check something specific ? I ve taken a picture of the console before force reset
Hrm, not sure then. Is it a crash or a freeze? It seems that crash dumps are recorded, but that this doesn’t work for freezes. One needs to set up a netconsole etc. to look at freezes.
And, is this with the I/O scheduler changed to something other than bfq (i.e., for us to be able to say that bfq is not the issue?)
If this didn’t happen after a dnf update, then we can say that dnf is not the issue here. Could be I/O related, since dnf updates do quite a bit of that.
Based on your screenshot, it’s a bfq related crash, but if changing the scheduler to something other than bfq doesn’t help, at the moment I don’t have other suggestions.
Please upload the picture you took—is it the same error as the first picture?
Would I have a call trace if it was a freeze ?
I’m not sure. I couldn’t get one when I was seeing the freeze (nothing was being written to disk). I think freezes require us to set up serial/net consoles to get logs.
There must be howtos somewhere on the web, so worth looking for one that provides step by step instructions.
I think you didn’t understand my question or we don’t talk about the same thing
But can we agree that Bfq is not anymore mentionned ?
Also in my experience I would not have that screen if it was just a freeze, or no ?
Well, as you see, this trace is different from the one that you had pasted before, so it may not be the same issue.
Looks like a kernel crash (a kernel crash can also cause a freeze and you may not be able to see the crash if your system freezes).
Try search the web and the kernel bugzilla for the text in the trace here, and that will hopefully help you figure out what’s causing it (or find a bug etc.)
I reviewed the problem in deep. And look for everything related to something that might be even remotely exotic to kvm.
Indeed one of my vm was starting a guest vm of its own and I forgot about that. So I guess that the new kernel update did modify something about nested virtualization.
And I’m suspecting that the bfq call trace was about that too.
So good call to make me point to that.