FAdump can hang for an LPAR in KVM Capable mode

Problem

For a KVM capable LPAR running f40, attempt to perform Firmware-Assisted Dump (FADump) to capture kernel crash dumps can fail with the kernel reporting following errors.

    OOM error trace.

And subsequently the system hangs without any error trace.

Cause

In the Firmware-Assisted Dump (FADump) method to capture kernel crash dumps, there is a possibility for the kernel dump capture fails with an out-of-memory (OOM) error or for the system to hang without any error trace. This failure is due to the increase in the memory required to boot the FADump capture kernel after Kernel Electric-Fence (KFENCE) support is enabled. The additional memory requirement is due to the memory footprint that is used to map all the memory at page-level granularity. This issue can be resolved by mapping only the memory, which needs mapping at page granularity to enable KFENCE support

Related Issues

Bugzilla report: #2297187

The bugzilla report also points to an kernel patch thats accepted upstream and reduces the memory overhead when KFENCE is enabled (which is the default on f40) by using larger page mappings ( 2M rather than 64K ).

Workarounds

As a workaround, reserve a certain amount of memory for the FADump capture kernel by updating the bootload entry and rebooting the system. The following steps describes the procedure to reserve memory for the FADump capture kernel:

From the dmesg log, get the reserved memory value. In the following example of the dmesg log, 3827456 KB is the reserved memory value:

[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 95787264K/104857600K available (16576K kernel code, 116800K rwdata, 16192K rodata, 6208K init, 
34377K bss, 3827456K reserved, 5242880K cma-reserved)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=16, Nodes=32
[    0.000000] ftrace: allocating 42780 entries in 16 pages

Run the free -m command and get the used memory value from the displayed output. In the following sample output, the used memory value is 1427 MB:

# free -m
               total        used        free      shared  buff/cache   available
Mem:           98830        1427       97374          17         591       97402
Swap:           4095           0        4095
 #

Reserve memory for the FADump capture kernel, which is approximately equal to the sum of the reserved memory and used memory, by updating the bootloader entry. For example, if the reserved memory is 3827456 KB and the used memory is 1427 MB, then update the bootloader entry with the following entry, where 5200 MB is the approximate sum of the reserved memory (3827456 KB) and the used memory (1427 MB):

fadump=on crashkernel=5200M

Reboot the system.

Thanks for the writeup. I currently think that this doesn’t affect a large enough portion of our userbase to be included in Common Issues . It is a very concrete and specialized bug that most people won’t hit. But you can try to convince me :slight_smile:

From Proposed Common Issues to Ask Fedora