Screen issue with Thinkpad and dock

Hey all, I’m using fedora 38 on my T14s gen1 with AMD and a dock (tried multiple docks from lenovo with same outcome)

When connected all is fine for a while but after a certain time (or load, didn’t yet determine the trigger, could be load since it happened with chrome & firefox in the forefront) during work the screen freezes, and then the laptop enters this loop of screen turning off, then on, and so on while being unresponsive. This happened on various lenovo docks like the usb c, thunderbolt 4. There is no bug report since I need to hard restart to get it to work again.

Here are some outputs:

Linux t14s 6.3.4-201.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Sat May 27 15:08:36 UTC 2023 x86_64 GNU/Linux

Jun 02 11:19:18 t14s kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=232441, emitted seq=232443
Jun 02 11:19:18 t14s kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2363 thread gnome-shel:cs0 pid 2386
Jun 02 11:19:18 t14s kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jun 02 11:19:18 t14s kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
Jun 02 11:19:18 t14s kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jun 02 11:19:20 t14s firefox.desktop[11632]: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GFX: RenderThread detected a device reset in PostUpdate (t=18958.1) [GFX1->
Jun 02 11:19:20 t14s gnome-shell[2363]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
Jun 02 11:19:20 t14s gnome-shell[2363]: amdgpu: The process will be terminated.
Jun 02 11:19:25 t14s firefox.desktop[11632]: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GFX: RenderThread detected a device reset in PostUpdate (t=18958.1) |[1][G>
Jun 02 11:19:25 t14s firefox.desktop[11632]: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GFX: RenderThread detected a device reset in PostUpdate (t=18958.1) |[1][G>
Jun 02 11:19:25 t14s gnome-software[2720]: Error reading events from display: Broken pipe

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon Graphics (renoir, LLVM 16.0.4, DRM 3.52, 6.3.4-201.fc38.x86_64) (0x1636)
    Version: 23.1.1
    Accelerated: yes
    Video memory: 512MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 161 MB, largest block: 161 MB
    VBO free aux. memory - total: 7453 MB, largest block: 7453 MB
    Texture free memory - total: 161 MB, largest block: 161 MB
    Texture free aux. memory - total: 7453 MB, largest block: 7453 MB
    Renderbuffer free memory - total: 161 MB, largest block: 161 MB
    Renderbuffer free aux. memory - total: 7453 MB, largest block: 7453 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 512 MB
    Total available memory: 8108 MB
    Currently available dedicated video memory: 161 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon Graphics (renoir, LLVM 16.0.4, DRM 3.52, 6.3.4-201.fc38.x86_64)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 23.1.1
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.1.1
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.1.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

This could be overheating causing some component to shut down. With laptops it is often caused by dust accumulating on heat sinks, fans, and other surfaces. There are multiple tools available to monitor temperature sensors. Some tasks exceed the capacity of a laptop.

Does the freeze happen during use or idle (not moving mouse/keyboard)?

Does it happen without dock?


These errors are somewhat generic, as in they appear for many different types of issues:

Jun 02 11:19:18 t14s kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=232441, emitted seq=232443
Jun 02 11:19:18 t14s kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2363 thread gnome-shel:cs0 pid 2386
Jun 02 11:19:18 t14s kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!

Although it does point to gnome-shell as the cause (not necessarily the root of the problem).

It doesn’t happen without the dock, and it does happen with some load while working, not on idle. I think I was able to replicate, IIRC opening firefox after a crash prompted to reopen tabs causing it to crash again.

I updated this morning and the kernel for me now is 6.3.5, and seems fine so far. Could be some updates in between fixed this, saw others complaining of similar dock issues in the 6.1 and 6.2 kernels.

I spoke too soon, it seems crashing still occurs. I opened a bug, posting here for visibility:
https://bugzilla.redhat.com/show_bug.cgi?id=2219764