System freeze because of intel speedstep

Hi, i run Fedora on a Lenovo Thinkpad P15v Gen2 with a i7-11800H CPU. I started with Fedora 35 but the issues began only a few months ago.

Most of the time the symptoms began directly after booting the machine or never. There were different type of bugs:

  • Qemu virtmanager hangs and the process ends.
  • Firefox freezes on rightclick or whenever any dialog should open (e.g. “do you want to uninstall this extension?”)
  • Thunderbird freezes completely without any activity.
    If any of these negative events occourd it was impossible to shutdown the machine. The shutdown never finished, only a hard power off made it possible to try a new run.

The firefox event was more reproducable. Most of the time, if i open 1-2 apps like file explorer and terminal and firefox directly after logging in and rightclick into firefox, i hit the freeze most of the time. That’s how i was debugging this in the last days with different combinations of devices plugged in.

The error did not occour in battery mode what made me look into the bios where i found the speedstep option. Since I disabled the function, the error never occoured again (only one day but it looks good).

So why am I here? I want to dive deeper and find out if there is a better solution for this problem. Maybe there is a kernel parameter that makes it possible to use this feature without bugs.
Or maybe there is some bug in fedora or the kernel, since the problems began only some months ago.

And even if I don’t find a better solution, maybe I can learn some new stuff trying out your ideas!

So, who has ideas? :slight_smile:

You did not tell us what version of Fedora you are using.
speed step works well and personally I have never seen it fail.

Please install inxi and post the output of inxi -Fx that will contain lots of details that will help us help you.

i use always the latest version 1-2 weeks after available, f39 since yesterday. the problem started while f38 i think.

  Kernel: 6.5.12-300.fc39.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.40-13.fc39 Console: pty pts/1 Distro: Fedora release 39 (Thirty Nine)
  Type: Laptop System: LENOVO product: 21A9003WGE v: ThinkPad P15v Gen 2i
    serial: <filter>
  Mobo: LENOVO model: 21A9003WGE v: SDK0J40697 WIN serial: <filter>
    UEFI: LENOVO v: N38ET43W (1.24 ) date: 11/14/2023
  ID-1: BAT0 charge: 66.6 Wh (100.0%) condition: 66.6/68.0 Wh (97.9%)
    volts: 12.3 min: 11.5 model: Celxpert 5B10W13961 status: full
  Device-1: hidpp_battery_0 model: Logitech Marathon Mouse/Performance Plus
    M705 charge: 10% (should be ignored) status: discharging
  Info: 8-core model: 11th Gen Intel Core i7-11800H bits: 64 type: MT MCP
    arch: Tiger Lake rev: 1 cache: L1: 640 KiB L2: 10 MiB L3: 24 MiB
  Speed (MHz): avg: 919 high: 1100 min/max: 800/2300 cores: 1: 792 2: 1100
    3: 800 4: 1042 5: 800 6: 800 7: 948 8: 1015 9: 800 10: 812 11: 800 12: 800
    13: 1043 14: 1015 15: 1076 16: 1073 bogomips: 73728
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Device-1: Intel TigerLake-H GT1 [UHD Graphics] vendor: Lenovo driver: i915
    v: kernel arch: Gen-12.1 bus-ID: 00:02.0
  Device-2: Bison [] driver: uvcvideo type: USB bus-ID: 3-4:3
  Display: server: X.Org v: 23.2.2 with: Xwayland v: 23.2.2 driver: X:
    loaded: modesetting unloaded: fbdev,vesa dri: iris gpu: i915 resolution:
    1: 3840x2160~60Hz 2: 1920x1080~60Hz
  API: OpenGL v: 4.6 vendor: intel mesa v: 23.2.1 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel UHD Graphics (TGL GT1)
  API: EGL Message: EGL data requires eglinfo. Check --recommends.
  Device-1: Intel Tiger Lake-H HD Audio vendor: Lenovo
    driver: sof-audio-pci-intel-tgl bus-ID: 00:1f.3
  API: ALSA v: k6.5.12-300.fc39.x86_64 status: kernel-api
  Server-1: PipeWire v: 1.0.0 status: n/a (root, process)
  Device-1: Intel Tiger Lake PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3
  IF: wlp0s20f3 state: down mac: <filter>
  Device-2: Intel Ethernet I219-V vendor: Lenovo driver: e1000e v: kernel
    port: N/A bus-ID: 00:1f.6
  IF: enp0s31f6 state: down mac: <filter>
  Device-3: ASIX AX88179 Gigabit Ethernet driver: cdc_ncm type: USB
    bus-ID: 2-1.2:3
  IF: enp0s13f0u1u2c2 state: up speed: 1000 Mbps duplex: half mac: <filter>
  IF-ID-1: docker0 state: down mac: <filter>
  Device-1: Intel AX201 Bluetooth driver: btusb v: 0.8 type: USB
    bus-ID: 3-14:7
  Report: btmgmt ID: hci0 rfk-id: 1 state: up address: <filter> bt-v: 5.2
    lmp-v: 11
  Local Storage: total: 2.75 TiB used: 235.53 GiB (8.4%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 2TB
    size: 1.82 TiB temp: 30.9 C
  ID-2: /dev/nvme1n1 vendor: Toshiba model: N/A size: 953.87 GiB
    temp: 32.9 C
  ID-1: / size: 952.28 GiB used: 235.06 GiB (24.7%) fs: btrfs
    dev: /dev/nvme1n1p3
  ID-2: /boot size: 973.4 MiB used: 462.5 MiB (47.5%) fs: ext4
    dev: /dev/nvme1n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 17.4 MiB (2.9%) fs: vfat
    dev: /dev/nvme1n1p1
  ID-4: /home size: 952.28 GiB used: 235.06 GiB (24.7%) fs: btrfs
    dev: /dev/nvme1n1p3
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) dev: /dev/zram0
  System Temperatures: cpu: 49.0 C mobo: N/A
  Fan Speeds (rpm): fan-1: 0
  Processes: 483 Uptime: 19m Memory: total: 64 GiB available: 62.51 GiB
  used: 7.06 GiB (11.3%) igpu: 64 MiB Init: systemd target: graphical (5)
  Compilers: gcc: 13.2.1 Packages: 88 note: see --rpm Shell: Bash v: 5.2.21
  inxi: 3.3.31

Is the problem still present with f39?
You might want to run a memory test just in case the problem is RAM.
Oh and check if there is newer firmware for your machine.

yes, also with f39 until i disabled the bios switch

I suspect a BIOS issue. If there is not fixed firmware your work around seems the best you can do.

Are you overclocking? If yes, try using in spec timings.

Two common sources of errors associated with higher CPU speeds need ruling out: overheating, possibly due to accumulated dust on cooling fins and fans, and memory errors. Some laptop models are better at collecting dust than others, so it is worth searching reports of dust buildup and cleaning for your model. Use memtest86plus version 6.20 or later running all cores and watching temperatures.

The SMART disk monitoring (in Gnome Disks or smartmontools for command-line) may provide historical peak temperatures of your storage devices which can tell you if the system has overheated in the past.

The problem primarily occours after (re-)booting without unusual high power usage. once everything is fine, thy system runs and runs, regardless what i do. there is no correlation with possible overheating.

No, not overclocking. For now it works perfectly disabling this feature.