kernel 5.4.10 is in updates-testing and should go stable soon (maybe Monday). If you want to try sooner, you can upgrade just the kernel, while also enabling updates-testing repo.
I can not say at what point exactly the system was freezing, but both the “nvme timeout” and “USB power management unreliable” are suspicious to me. Anybody knows more about this logs and could brighten me up?
@vits95 I followed your instructions, hope this is better now:
Summary
[ 0.469905] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 0.469905] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[ 0.470244] #3
[ 0.475823] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[ 2.178713] usb: port power management may be unreliable
[ 2.630185] usb 1-7: config 1 interface 1 altsetting 0 endpoint 0x3 has wMaxPacketSize 0, skipping
[ 2.630190] usb 1-7: config 1 interface 1 altsetting 0 endpoint 0x83 has wMaxPacketSize 0, skipping
[ 3.889116] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[ 3.889553] acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[ 11.314725] printk: systemd: 23 output lines suppressed due to ratelimiting
[ 42.396646] nvme nvme0: I/O 832 QID 2 timeout, aborting
[ 42.528797] nvme nvme0: Abort status: 0x0
[ 42.542773] kauditd_printk_skb: 19 callbacks suppressed
[ 42.915296] systemd-journald[621]: File /run/log/journal/24a9fb3a012f46d894763e6ec4bfa78c/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 43.594218] resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:08 [mem 0xfed10000-0xfed13fff]
[ 43.594227] caller snb_uncore_imc_init_box+0x6c/0xb0 [intel_uncore] mapping multiple BARs
[ 43.675869] uvcvideo 1-8:1.0: Entity type for entity Extension 4 was not initialized!
[ 43.675872] uvcvideo 1-8:1.0: Entity type for entity Extension 3 was not initialized!
[ 43.675874] uvcvideo 1-8:1.0: Entity type for entity Processing 2 was not initialized!
[ 43.675876] uvcvideo 1-8:1.0: Entity type for entity Camera 1 was not initialized!
[ 44.128425] thermal thermal_zone3: failed to read out thermal zone (-61)
[ 45.738681] iwlwifi 0000:03:00.0: FW already configured (0) - re-configuring
[ 46.007657] iwlwifi 0000:03:00.0: FW already configured (0) - re-configuring
[ 48.298823] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[ 66.936633] systemd-journald[621]: File /var/log/journal/24a9fb3a012f46d894763e6ec4bfa78c/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
But I also compared this journal entries to one where I had no freeze and there is no difference. So I guess the cause can not be found here…
Also my Windows 10 which is installed parallel does not show this problem, so sending the laptop to a service-center would probably not help much. I am going to run the SMART test next, but first i need a fedora stick it seems.
Arch Wiki, Talk: Lenovo ThinkPad X1 Carbon (Gen 6)
“NVMe unsafe shutdowns
Does anyone else get non-zero “Unsafe Shutdowns” on NVMe (Model Number: LENSE30512GMSP34MEAT3TA)? smartctl on mine laptop constantly reports those, also I get frequent FS corruption errors after reboot :/”
Your’d mentioned “first i need a fedora stick it seems”. I’ve no idea about how it is handled by default, but possible your need to check the file system?
See, i’d freezes (and some other issues) from using the proprietary blob. It was long time ago.
Also i’d experienced shutdowns because of overheating. Cleaning the laptop and changing the proptietary blob to a free drivers is helped me out.
PS: This year i’d even used proprietary drivers without issue (but only a few weeks) before it started to glitch again for me (the all is on the same laptop).
Which make possible to modify the TDP and the Processor Frequency according to the demand. In some types of CPU, TDP jumps appear to be poorly managed with some kernel versions and may create instability that being said you can test the following:
If you have another versions (older) of the kernel try with them and see if the problem is reproduced.
If you know about your BIOS, you may want to try to deactivate the options such as turbo boost or idle states temporally to verify that this is not the reason for the instability through the management of a certain kernel in Linux of these options.
@xtym I turned off the Intel SpeedStep Technology in the BIOS and observe better responsiveness of the system and no freezes. Your suggestion seems to solve the problem so far!
I will mark your answer as solution if no problems show up in the next three days.
While reading up, I found older reports of problems with SpeedStep aswell:
Apparently the issues described there are fixed by now, but SpeedStep Technology has potential for problems.
One explanation I can think of: The degeneration of the silicon chip or the power electronics maybe increase the risk for complications and that would also explain why my laptop was originally working fine.