Problem
For a KVM capable LPAR running f40, attempt to perform Firmware-Assisted Dump (FADump) to capture kernel crash dumps can fail with the kernel reporting following errors.
OOM error trace.
And subsequently the system hangs without any error trace.
Cause
In the Firmware-Assisted Dump (FADump) method to capture kernel crash dumps, there is a possibility for the kernel dump capture fails with an out-of-memory (OOM) error or for the system to hang without any error trace. This failure is due to the increase in the memory required to boot the FADump capture kernel after Kernel Electric-Fence (KFENCE) support is enabled. The additional memory requirement is due to the memory footprint that is used to map all the memory at page-level granularity. This issue can be resolved by mapping only the memory, which needs mapping at page granularity to enable KFENCE support
Related Issues
Bugzilla report: #2297187
The bugzilla report also points to an kernel patch thats accepted upstream and reduces the memory overhead when KFENCE is enabled (which is the default on f40) by using larger page mappings ( 2M rather than 64K ).
Workarounds
As a workaround, reserve a certain amount of memory for the FADump capture kernel by updating the bootload entry and rebooting the system. The following steps describes the procedure to reserve memory for the FADump capture kernel:
From the dmesg log, get the reserved memory value. In the following example of the dmesg log, 3827456 KB is the reserved memory value:
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 95787264K/104857600K available (16576K kernel code, 116800K rwdata, 16192K rodata, 6208K init,
34377K bss, 3827456K reserved, 5242880K cma-reserved)
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=16, Nodes=32
[ 0.000000] ftrace: allocating 42780 entries in 16 pages
Run the free -m
command and get the used memory value from the displayed output. In the following sample output, the used memory value is 1427 MB:
# free -m total used free shared buff/cache available Mem: 98830 1427 97374 17 591 97402 Swap: 4095 0 4095 #
Reserve memory for the FADump capture kernel, which is approximately equal to the sum of the reserved memory and used memory, by updating the bootloader entry. For example, if the reserved memory is 3827456 KB and the used memory is 1427 MB, then update the bootloader entry with the following entry, where 5200 MB is the approximate sum of the reserved memory (3827456 KB) and the used memory (1427 MB):
fadump=on crashkernel=5200M
Reboot the system.