Problems and crashes after changing video card

dontech · September 22, 2024, 8:47pm

systemctl status gdm.service
Output:

gdm.service - GNOME Display Manager
Loaded: loaded (usr/lib/systemd/system/gdm.service; enabled; preset: enabled)
Dropi-In /usr/lib/systemd/system/service.d
10-timeout-abort.conf
Active: active (running) since Sun 2024-09-22 22:19:57 CEST; 10min ago
Main PID: 4352 (gdm)
Tasks: 4 (limit: 19013)
Memory: 3.6M
CPU: 67ms
CGroup: /system.slice/gdm.service
4352 /usr/sbin/gdm

set 22 22:19:57 fedora systemd[1]: Starting gdm.service - GNOME Display Manager...
set 22 22:19:57 fedora systemd[1]: Started gdm.service - GNOME Display Manager.
set 22 22:19:58 fedora gdm[4352]: Gdm: GdmDisplay: Session never registered, failing
set 22 22:19:58 fedora gdm[4352]: Gdm: Child process -4403 was already dead.
set 22 22:19:58 fedora gdm[4352]: Gdm: GdmDisplay: Session never registered, failing
set 22 22:19:58 fedora gdm[4352]: Gdm: Child process -4403 was already dead.

journalctl -b -u gdm
Output:

set 22 22:19:57 fedora systemd[1]: Starting gdm.service - GNOME Display Manager...
set 22 22:19:57 fedora systemd[1]: Started gdm.service - GNOME Display Manager.
set 22 22:19:58 fedora gdm[4352]: Gdm: GdmDisplay: Session never registered, failing
set 22 22:19:58 fedora gdm[4352]: Gdm: Child process -4403 was already dead.
set 22 22:19:58 fedora gdm[4352]: Gdm: GdmDisplay: Session never registered, failing
set 22 22:19:58 fedora gdm[4352]: Gdm: Child process -4403 was already dead.

I also tried restarting the gdm service but it crashes with each restart.
What can I do?

dontech · September 24, 2024, 2:43pm

@barryascott I share with you the SMART output of the SSD as you requested:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.11-200.fc39.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       14
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       10901
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       4433
160 Uncorrectable_Error_Cnt 0x0032   100   100   050    Old_age   Always       -       17338
161 Valid_Spare_Block_Cnt   0x0033   100   100   050    Pre-fail  Always       -       68
163 Initial_Bad_Block_Count 0x0032   100   100   050    Old_age   Always       -       18
164 Total_Erase_Count       0x0032   100   100   050    Old_age   Always       -       123047
165 Max_Erase_Count         0x0032   100   100   050    Old_age   Always       -       502
166 Min_Erase_Count         0x0032   100   100   050    Old_age   Always       -       18
167 Average_Erase_Count     0x0032   100   100   050    Old_age   Always       -       261
168 Max_Erase_Count_of_Spec 0x0032   100   100   050    Old_age   Always       -       7000
169 Remaining_Lifetime_Perc 0x0032   100   100   050    Old_age   Always       -       97
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Runtime_Invalid_Blk_Cnt 0x0032   100   100   050    Old_age   Always       -       14
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       475
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       40
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       139964426
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       17338
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       14
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       17338
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       68
241 Host_Writes_32MiB       0x0030   100   100   050    Old_age   Offline      -       327207
242 Host_Reads_32MiB        0x0030   100   100   050    Old_age   Offline      -       551970
245 TLC_Writes_32MiB        0x0032   100   100   050    Old_age   Always       -       706086

What can you tell me about the status of the SSD?

barryascott · September 24, 2024, 3:53pm

It looks like you are exceeding the flash endurance.
Its had a lot of data written to it.

You have written 32MiB x 706,086 bytes which is 22TiB.

This is the count of written data blocks that cannot be read, the data is lost.

Bottom line the SSD is failing. Replace it, from this failure I would assume not a quality brand?

I only use SSD from brands that document the “endurance” of the drive,
Usually Intel or Samsung SSDs.

barryascott · September 24, 2024, 3:58pm

For comparison my old Samsung SSD 860 EVO has an endurance of 2,400TiB of which I’ve used 9.5TiB in 5 time the powered-on-hours.

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       55462
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       294
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       9
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   065   045   000    Old_age   Always       -       35
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       280
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       19906432233

gnwiii · September 24, 2024, 5:40pm

Dom D:

set 20 16:50:24 smartd[913]: Device: /dev/sda [SAT], 14 Currently unreadable (pending)  sectors
set 20 16:50:24 smartd[913]: Device: /dev/sda [SAT], 17338 Offline uncorrectable sectors
set 20 16:50:24 smartd[913]: Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors
set 20 16:50:31 gnome-session-binary[4483]: Unrecoverable failure in required component org.gnome.Shell.desktop

There were errors for 2 drives. If a drive is mentioned in /etc/fstab without noauto that might be the Unrecoverable failure in required component even if the drive is only used for data.

dontech · September 29, 2024, 3:41pm

All those written data appeared after restoring a partition on the disk with Rescuezilla, otherwise before this operation the disk had no negative parameters.
Is it possible that restoring the partition could have caused this high number of writes?

Why do you say that the SSD is failing even though its remaining life percentage is 97%?

At the moment all the data on the disk I can read, even the data restored by Rescuezilla.

dontech · September 29, 2024, 3:48pm

The disk without noauto in /etc/fstab is /dev/sda, the one with other errors.
What does this have to do with the problem of the Gnome shell being installed on another disk, though?

dontech · September 29, 2024, 3:49pm

So far, I have been unable to find solutions to the problem described in the initial post.
Does anyone have any suggestions?
Otherwise I’m stuck with having to reinstall the system, all applications, and reapply configurations just for a video card change.
Thanks!

barryascott · September 29, 2024, 4:36pm

Because of the uncorrectable errors.
That had always been the prelude to failure of a disk.

Good wear levelling should mean that you do not see any failures until the remaining life is a lot smaller. Given you have uncorrectable errors that suggests that wear levelling is not working or some other failure mode is at work.

gnwiii · September 29, 2024, 6:30pm

If you don’t have no-auto, gnome may attempt to mount /dev/sda at startup, explaining the Unrecoverable failure in required component org.gnome.Shell.desktop. Have you tried adding noauto to see if the system will boot? Do you still have the original video card to see if the system will boot with the old card?

Topic		Replies	Views
Complete noobie to Fedora on a Dell Workstation with nvidia issues Ask Fedora f33 , nvidia	8	529	January 11, 2021
Fedora 38 and Nvidia/Nouveau problems (kernel crashes, second fake monitor and more) Ask Fedora f38 , kde , wayland , installation , gnome , nvidia , kernel	24	6293	June 4, 2023
Upgrading to F38 broke Xorg Ask Fedora f36 , f38 , wayland , x11 , gnome , nvidia	8	639	July 13, 2023
Nvidia video card not working Ask Fedora wayland , gnome , nvidia , f40	33	329	August 17, 2024
Boot issue with Nvidia drivers and kernels after 6.4.4 on Fedora 37 Ask Fedora f37 , f38 , amd , amdgpu , nvidia , kernel	6	1349	October 18, 2023

Problems and crashes after changing video card

Related topics