Hard Freezes after memory swap

I have a system76 lemur pro, a couple of years old. I started getting really weird behavior and crashes, and when I ran memtester, got plenty of errors. It’s 16 gig, 8 on the mb and 8 on a dimm. I pulled the 8 out and ran it again, no problems and the system performed fine.

So I looked on line and coundn’t find a direct replacement, so I ended up a 16gig dimm. Everything seemed to go well for a bit. I ran memtester with no errors. Computer boots and runs. Then - not sure if this started after an update or just right away, I started getting much more severe lockups. I ran memtester again without errors. I checked the nvme drive, no errors. I didn’t seem to not get a lockup in a tty, but mostly in DE (plasma). Keyboard might stop working while I can see widgets still active, or maybe screen freezes but mouse moves, but I can’t get to a tty.

I was checking journalctl, and not getting any new errors or anything odd. No kernel panics. I one got a cpu:7 is soft locked notification in kde and journal, but that just happened once.

I was going to reinstall to see if that had an effect, but before I did that I pulled the new dimm, and now the system is stable again. It leaves me scratching my head. Would going from 16 to 8 to 24 gigs of ram cause this error or if there is some odd incompatibility with the ram. I’m going to cross post this to the system76 forum. If there was a problem, would memtester throw an error?

I’ve tried different kernels, tried plugged in and not plugged in. I opened the computer and reset the memory. External monitors, keyboards, mice, Wifi vs ethernet, suspend or not. I’m kind of at a loss as I can’t think of why the memory would be causing this problem if it checks out and the system boots. I also can’t do something that consistently recreates the crash…except so far removing the dimm prevented it.

Here’s my inxi - this is without the memory, obviously. I’m also putting photo of original and purchased memory dimm


System:
  Kernel: 6.5.6-200.fc38.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.39-9.fc38 Desktop: KDE Plasma v: 5.27.8 Distro: Fedora release 38
    (Thirty Eight)
Machine:
  Type: Laptop System: System76 product: Lemur Pro v: lemp9
    serial: <superuser required>
  Mobo: System76 model: Lemur Pro v: lemp9 serial: <superuser required>
    UEFI: coreboot v: 2023-08-18_a8dd6c2 date: 08/18/2023
Battery:
  ID-1: BAT0 charge: 47.7 Wh (92.8%) condition: 51.4/73.9 Wh (69.6%)
    volts: 8.3 min: 7.7 model: Notebook BAT status: discharging
CPU:
  Info: quad core model: Intel Core i5-10210U bits: 64 type: MT MCP
    arch: Comet/Whiskey Lake note: check rev: C cache: L1: 256 KiB L2: 1024 KiB
    L3: 6 MiB
  Speed (MHz): avg: 475 high: 1000 min/max: 400/4200 cores: 1: 400 2: 400
    3: 400 4: 400 5: 400 6: 1000 7: 400 8: 400 bogomips: 33599
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel CometLake-U GT2 [UHD Graphics] vendor: CLEVO/KAPOK
    driver: i915 v: kernel arch: Gen-9.5 bus-ID: 00:02.0
  Device-2: Chicony USB2.0 Camera driver: uvcvideo type: USB bus-ID: 1-7:2
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.9
    compositor: kwin_wayland driver: X: loaded: modesetting unloaded: fbdev,vesa
    dri: iris gpu: i915 resolution: 1920x1080
  API: EGL v: 1.5 drivers: iris,swrast platforms:
    active: wayland,x11,surfaceless,device inactive: gbm
  API: OpenGL v: 4.6 vendor: intel mesa v: 23.1.8 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel UHD Graphics (CML GT2)
  API: Vulkan v: 1.3.243 drivers: intel,llvmpipe surfaces: xcb,xlib,wayland
    devices: 2
Audio:
  Device-1: Intel Comet Lake PCH-LP cAVS vendor: CLEVO/KAPOK
    driver: snd_hda_intel v: kernel bus-ID: 00:1f.3
  API: ALSA v: k6.5.6-200.fc38.x86_64 status: kernel-api
  Server-1: PipeWire v: 0.3.82 status: active
Network:
  Device-1: Intel Comet Lake PCH-LP CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3
  IF: wlp0s20f3 state: up mac: <filter>
  IF-ID-1: wg-mullvad state: unknown speed: N/A duplex: N/A mac: N/A
Bluetooth:
  Device-1: Intel Bluetooth 9460/9560 Jefferson Peak (JfP) driver: btusb
    v: 0.8 type: USB bus-ID: 1-10:3
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.1
    lmp-v: 10
Drives:
  Local Storage: total: 465.76 GiB used: 150.87 GiB (32.4%)
  ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS500G2B0C-00PXH0
    size: 465.76 GiB temp: 31.9 C
Partition:
  ID-1: / size: 455.74 GiB used: 147.77 GiB (32.4%) fs: btrfs dev: /dev/dm-0
    mapped: luks-aa73aa2a-6886-4979-8ef9-d08c0c408190
  ID-2: /boot size: 973.4 MiB used: 303.9 MiB (31.2%) fs: ext4
    dev: /dev/nvme0n1p3
  ID-3: /boot/efi size: 3.99 GiB used: 2.81 GiB (70.4%) fs: vfat
    dev: /dev/nvme0n1p2
  ID-4: /home size: 455.74 GiB used: 147.77 GiB (32.4%) fs: btrfs
    dev: /dev/dm-0 mapped: luks-aa73aa2a-6886-4979-8ef9-d08c0c408190
Swap:
  ID-1: swap-1 type: partition size: 4 GiB used: 0 KiB (0.0%)
    dev: /dev/nvme0n1p4
  ID-2: swap-2 type: zram size: 7.62 GiB used: 0 KiB (0.0%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 51.0 C pch: 39.0 C mobo: N/A
  Fan Speeds (rpm): cpu: 0
Info:
  Processes: 477 Uptime: 4m Memory: total: 8 GiB available: 7.62 GiB
  used: 2.45 GiB (32.2%) Init: systemd target: graphical (5) Compilers:
  gcc: 13.2.1 Packages: 17 note: see --rpm Shell: Bash v: 5.2.15 inxi: 3.3.30

The user must ensure the DIMM installed is compatible with the motherboard and bios.
The system76 tech specs say this for the lemp9 system

image

Thus it is clear that the system supports 8, 16, or 32G SO-DIMM in the extra slot. It also seems it must be compatible with the 8G that is on the mobo (Samsung K4AAG165WA-BCTD). Samsung shows that memory rated at 2666 Mbps speed.

There are really 2 possibilities for the extra memory issue. 1. The memory installed is incompatible; or 2. the connections to the DIMM are not making good contact.

Since this worked well with the original card from the beginning then began having issues recently I would suspect it is an issue with the contacts to the RAM chip, especially since a new dimm is also having issues.

You can remove the chip then carefully blow out the slot and clean the contacts before reseating the chip. The contacts on the DIMM itself can be cleaned with an eraser, but the contacts in the slot are fragile and easily damaged. You should never touch the contacts with bare fingers since that leaves a residue that causes corrosion

Thanks for the reply. Went I purchased the memory I did my best to match speeds. The photos I posted both show a speed of 2666 with CL19 (not sure what the CL 19 means). Only one shows voltage.

When I first had the problem with the 8 gig dimm I opened the case, reseated memory and was still getting errors. I blew out the memory slot (nothing was very dusty), but got the same errors. BTW, as I mentioned I narrowed down the problem by testing the memory with memtester. I also removed the dimm and ran memtester with no errors with the onboard memory.

If the problem was the dimm or the connection in the slot, would one typically get memory errors with memtester after I installed the memory? I’ve had memory compatibility problems before in other computers and usually what happened was it either refused to boot, or the memory didn’t show up - so I’m really scratching my head. I opened a ticket with system76, so it might turn out there is a werid compatibility issue.

A bad contact (that may be as small as a single contact pin) may only show up when that particular address point is used and otherwise be totally invisible to normal operations. Since memory testers try to address the entire space it would show, but with normal operations could easily be very intermittent.

Errors with already installed memory, or with newly installed memory could both be related to a bad contact point in the slot.

Since the permanently installed memory is showing no errors it seems directly connected to the slot, and since it is seen with both the original SO-DIMM as well as the replacement it would seem even more to be identified as a bad contact.