How to analize a system freeze

Hello, I would like to know how is the technical way to start analyzing a system freeze and how to determine what causes it and what is the possible solution?

It’s important to me because a I want to learn.

My system specs:

inxi -Fzxx

System:
  Kernel: 5.17.9-300.fc36.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.37-27.fc36 Desktop: GNOME v: 42.1 tk: GTK v: 3.24.34 wm: gnome-shell
    dm: GDM Distro: Fedora release 36 (Thirty Six)
Machine:
  Type: Laptop System: ASUSTeK product: ROG Strix G731GU_G731GU v: 1.0
    serial: <superuser required>
  Mobo: ASUSTeK model: G731GU v: 1.0 serial: <superuser required>
    UEFI: American Megatrends v: G731GU.312 date: 02/19/2021
Battery:
  ID-1: BAT0 charge: 29.4 Wh (60.6%) condition: 48.5/66.0 Wh (73.5%)
    volts: 15.7 min: 15.7 model: ASUSTeK ASUS Battery serial: N/A
    status: not charging
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse B330/M330/M331
    serial: <filter> charge: 55% (should be ignored) status: discharging
CPU:
  Info: 6-core model: Intel Core i7-9750H bits: 64 type: MT MCP
    arch: Coffee Lake rev: A cache: L1: 384 KiB L2: 1.5 MiB L3: 12 MiB
  Speed (MHz): avg: 868 high: 900 min/max: 800/4500 cores: 1: 800 2: 800
    3: 809 4: 900 5: 846 6: 900 7: 900 8: 900 9: 900 10: 900 11: 882 12: 886
    bogomips: 62399
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel CoffeeLake-H GT2 [UHD Graphics 630] vendor: ASUSTeK
    driver: i915 v: kernel ports: active: eDP-1 empty: none bus-ID: 00:02.0
    chip-ID: 8086:3e9b
  Device-2: NVIDIA TU116M [GeForce GTX 1660 Ti Mobile] vendor: ASUSTeK
    driver: nvidia v: 510.68.02 arch: Turing pcie: speed: 2.5 GT/s lanes: 8
    ports: active: none empty: DP-1,HDMI-A-1 bus-ID: 01:00.0
    chip-ID: 10de:2191
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.2
    compositor: gnome-shell driver: gpu: i915 display-ID: 0
  Monitor-1: eDP-1 model: AU Optronics 0x409d res: 1920x1080 dpi: 128
    diag: 438mm (17.3")
  OpenGL: renderer: Mesa Intel UHD Graphics 630 (CFL GT2)
    v: 4.6 Mesa 22.0.3 direct render: Yes
Audio:
  Device-1: Intel Cannon Lake PCH cAVS vendor: ASUSTeK driver: snd_hda_intel
    v: kernel bus-ID: 00:1f.3 chip-ID: 8086:a348
  Device-2: NVIDIA TU116 High Definition Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel pcie: speed: 2.5 GT/s lanes: 8
    bus-ID: 01:00.1 chip-ID: 10de:1aeb
  Sound Server-1: ALSA v: k5.17.9-300.fc36.x86_64 running: yes
  Sound Server-2: PulseAudio v: 15.0 running: no
  Sound Server-3: PipeWire v: 0.3.51 running: yes
Network:
  Device-1: Intel Cannon Lake PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3 chip-ID: 8086:a370
  IF: wlo1 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: ASUSTeK driver: r8169 v: kernel pcie: speed: 2.5 GT/s lanes: 1
    port: 3000 bus-ID: 03:00.0 chip-ID: 10ec:8168
  IF: eno2 state: down mac: <filter>
Bluetooth:
  Device-1: Intel Bluetooth 9460/9560 Jefferson Peak (JfP) type: USB
    driver: btusb v: 0.8 bus-ID: 1-14:4 chip-ID: 8087:0aaa
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
  Local Storage: total: 1.14 TiB used: 690.1 GiB (59.0%)
  ID-1: /dev/nvme0n1 vendor: Western Digital
    model: PC SN520 SDAPNUW-256G-1002 size: 238.47 GiB speed: 15.8 Gb/s
    lanes: 2 serial: <filter> temp: 36.9 C
  ID-2: /dev/sda vendor: Seagate model: ST1000LX015-1U7172 size: 931.51 GiB
    speed: 6.0 Gb/s serial: <filter> temp: 40 C
Partition:
  ID-1: / size: 71.65 GiB used: 10.99 GiB (15.3%) fs: btrfs
    dev: /dev/nvme0n1p6
  ID-2: /boot/efi size: 96 MiB used: 39.6 MiB (41.3%) fs: vfat
    dev: /dev/nvme0n1p1
Swap:
  ID-1: swap-1 type: partition size: 5 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/nvme0n1p5
  ID-2: swap-2 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 47.0 C pch: 50.0 C mobo: N/A
  Fan Speeds (RPM): cpu: 2400
Info:
  Processes: 366 Uptime: 21m Memory: 15.47 GiB used: 3.08 GiB (19.9%)
  Init: systemd v: 250 runlevel: 5 target: graphical.target Compilers:
  gcc: 12.1.1 Packages: note: see --pkg flatpak: 3 Shell: Bash v: 5.1.16
  running-in: gnome-terminal inxi: 3.3.16

System Freeze Context:
It happens right when I press the keys(Ctrl+Alt+F9)

The first entry is just after pressing (Ctrl+Alt+F9) when the system freezes:

May 27 11:34:13 rog kernel: rfkill: input handler enabled
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/ldac
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSink/aptx_hd
May 27 11:34:13 rog gsd-media-keys[2830]: Unable to get default source
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aptx_hd
May 27 11:34:13 rog gsd-media-keys[2830]: Unable to get default sink
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSink/aptx
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aptx
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aac
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSink/sbc
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/sbc
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSink/sbc_xq
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/sbc_xq
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aptx_ll_1
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aptx_ll_0
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aptx_ll_duplex_1
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/aptx_ll_duplex_0
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/faststream
May 27 11:34:13 rog bluetoothd[1292]: Endpoint unregistered: sender=:1.86 path=/MediaEndpoint/A2DPSource/faststream_duplex
May 27 11:34:13 rog bluetoothd[1292]: Player unregistered: sender=:1.86 path=/media_player0
May 27 11:34:13 rog uresourced[1607]: Setting resources on user-1000.slice (MemoryMin: 0, MemoryLow: 0, CPUWeight: 100, IOWeight: 100)
May 27 11:34:13 rog uresourced[1607]: Setting resources on user@1000.service (MemoryMin: 0, MemoryLow: 0, CPUWeight: 100, IOWeight: 100)
May 27 11:34:13 rog uresourced[1607]: Setting resources on user.slice (MemoryMin: 0, MemoryLow: 0, CPUWeight: -, IOWeight: -)
May 27 11:34:14 rog systemd[1]: virtqemud.service: Deactivated successfully.
May 27 11:34:14 rog audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=virtqemud comm="systemd" exe="/usr/lib/systemd/systemd" host>
-- Boot a070b3b42319430ea0f60011840ef22f --
2 Likes

It depends on the freeze. If the whole system is freezing and you are unable to get to a tty to access a terminal to look at logs and so on, I expect you’ll have to use the kernel netconsole to debug it:

https://www.kernel.org/doc/html/latest/networking/netconsole.html

PS: what is ctrl alt f9 supposed to do? (Just a tty? I usually do it with ctrl alt f3 so not got to f9 yet :slight_smile: )

2 Likes

Hello! Yes, the whole system is frozen. And it is to enter TTY9.

2 Likes

Ah, cool. No I don’t know about the debug-shell bits. Sorry. (I expect if the system is frozen this may not work, so the kernel netconsole is the way to go).

1 Like

Just to ensure that I got you right: The freeze only happens immediately after you press CTRL+ALT+F9. And when you press CTRL+ALT+F9, the switch to TTY9 works, but then you get the output you mentioned, and with this output on screen, the system is then frozen?

Besides the points @ankursinha mentioned, I could offer you an implicit way that might identify the problem. The logs you have shown contain two services that could be related, and that have already been related to such issues on other users’ systems: bluetoothd and virtqemud.

You may try to deactivate them, one by one, and see if it makes a difference (use systemctl stop <service name> to stop it immediately; usesystemctl disable <service name> to make it not start again after reboot. But “disable” does not contain “stop”!). Also, you can use acpitool (it can be installed using dnf) to find out if it is not bluetooth in general, but a specific Bluetooth device that causes the issue. Some elaboration about how to use acpitool for that can be found in this thread.

Further, did you have the problem also with other kernels? When booting your system, you have three kernels you can choose in grub. Test it with the other kernels and let us know if the problem persists with them as well.

Sometimes it can happen that a specific kernel and a piece of hardware don’t like each other :slight_smile: In such a case, it is the easiest work around to jump over the related kernel (but ensure that you do not use a kernel with security vulnerabilities) and file a bug. In any case, testing the issue with the different kernels is an important initial step to identify if the issue is related to a specific kernel or not.

1 Like

Yes, immediately.

No, bc is inmediately frozen.

I disable the services one by one and it keep freezing.

The idea with acpitool output is deactivate one by one those that are activated?

acpitool -w
   Device	S-state	  Status   Sysfs node
  ---------------------------------------
  1. PEG0	  S4	*enabled   pci:0000:00:01.0
  2. PEGP	  S4	*disabled  pci:0000:01:00.0
  3. PEG1	  S4	*disabled
  4. PEGP	  S4	*disabled
  5. PEG2	  S4	*disabled
  6. PEGP	  S4	*disabled
  7. RP15	  S4	*enabled   pci:0000:00:1d.6
  8. PXSX	  S4	*disabled  pci:0000:03:00.0
  9. XHC	  S3	*enabled   pci:0000:00:14.0
  10. XDCI	  S4	*disabled
  11. HDAS	  S4	*disabled  pci:0000:00:1f.3
  12. AWAC	  S4	*enabled   platform:ACPI000E:00

1 Like

Yes. But that would have made only sense if disabling bluetoothd had solved the issue. If disabling bluetoothd had solved it, it would have been worth to start it again and check each single bluetooth device. However, if disabling bluetoothd does not change the situation, the individual bluetooth devices can be excluded as being related to the issue. Because you say that the issue remains when the bluetoothd service is disabled, we can forget the acpitool & bluetooth devices.

How about the different kernels? Does the problem persist with the two older kernels?

1 Like

This is not something urgent for me in fact, it only called my attention to see if I could know how to fix it and learn.

I tried all 3 kernels and the problem persists.

What I did find interesting is that I tried on a guest OS with F36 and it worked seamlessly and I have F36 too.

Just to clarify?
By guest OS are you referring to a running VM? Or are you referring to booting bare metal with a live image? There is a distinct difference in the way the hardware is seen on those 2 situations.

A VM sees virtual hardware and is not acting directly on the physical hardware. A boot to live image is using the actual hardware.

There is also a distinct difference between the software versions on a live image and that of a running and probably updated OS.

All these are factors that must be accounted for when trying to track down the cause of a freeze (or any other problem).