How to diagnose spontaneous reboots?

Hello all,

After eleven years, it was time for me to invest in a new desktop computer. I chose an AM5 platform with more RAM and more CPU cores than before. I decided to install Fedora 40 with LUKS2 disk encryption (both on system SSD and data HDD). The computer has been running for seven weeks now, about 15 hours a day. It has rebooted spontaneously about 5-6 times during this period.

Because of the disk encryption, I have to physically type in the password at my keyboard upon (re)boot; very annoying when you’re not at home and needing the services/data on that computer stuck at the LUKS2 prompt.

I contacted my local dealer who put the system together, but they can’t easily help me. They don’t know Linux in general, or Fedora in particular. As of yet, I cannot actively trigger the reboot. Hence this question.

My first thought was to look at the system logs of the previous boot (journalctl -b -1). There is nothing weird or remarkable at the end of that log; it just stops.

My second thought was to remove a 2x USB bracket slot-in extension that connected to the motherboard. That did not prevent the spontaneous reboot.

Yesterday, during normal desktop usage, the PC suddenly rebooted; in the middle of me doing nothing special. I had Nautilus, Thunderbird, VSCode and a Duplicati backup running.

How should I go about finding the cause of these reboots? Especially when I cannot see a pattern, or trigger them manually?

Thanks for your thoughts, kind regards,
FWieP

1 Like

Hi FWieP,

If the reboots are as sudden as they are without any log indications, this could point to a problem with the hardware.

Things to look for could include the power supply; does it have sufficient headroom to power all the components comfortably? It would also be good to check if all the internal parts and cables are attached firmly into their sockets.

Are the fans running and are the temperatures okay? This pertains mainly to the cpu and gpu.

You could perhaps do a full RAM check as well to rule out any faulty memory modules…

Without any indications in the logs it will be very difficult to diagnose this as a software / linux problem. Any issue in that area would at least leave some traces in the form of a segfault or kernel panic.

5 Likes

Another suggestion I have may be related to automatic updates. What is shown related to dnf history? Some updates require a reboot (such as kernel upgrades) and potentially that might trigger the reboot but also should show in the logs as a shutdown/restart.

Thanks for all your suggestions. I will address them one by one.

  • the power supply is sufficient (600W), be quiet!. I have no dedicated GPU or extra peripherals. Only the motherboard and one HDD are connected. No optical drive.
  • have just checked all internal cabling, both power and signal tracks. Where applicable, I disconnected and reconnected the connectors firmly.
  • all three fans are running (2x case, 1x CPU), there is no temperature issue.
  • ran a full memtest86+ default test (4x), that took 3 hours and 16 minutes: 0 errors.
  • dnf is probably not responsible for the reboots; the logs show nothing like that.

For now, I don’t see anything wrong and perhaps the cable checking has (hopefully) fixed the spontaneous reboots. Only time will tell; I will report back here when either the system reboots once again, or after 4 weeks of normal day-to-day use. I sincerely hope the latter :slight_smile: .

Thanks for your thoughts, kind regards,
FWieP

Just a wild guess. Some recent kernels are troublesome with some AMD integrated GPU. See for instance

Hi all,

I’m sorry to report back so soon (but glad at the same time). A hunch I previously had, has been sort of confirmed: the reboot happens while running a Duplicati backup job from a local LUKS2 encrypted disk to a remote server over SSH.

After this reboot, journalctl -b -1 finally shows something to work with:

nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:12 WaanzinsPC4 nautilus[7850]: Attempted to add a non-existent file to the view.
nov 16 17:06:18 WaanzinsPC4 kernel: sched: RT throttling activated
nov 16 17:06:19 WaanzinsPC4 kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0
nov 16 17:06:21 WaanzinsPC4 kernel: pcieport 0000:02:07.0: Unable to change power state from D3hot to D0, device inaccessible
nov 16 17:06:22 WaanzinsPC4 kernel: r8169 0000:06:00.0 enp6s0: NETDEV WATCHDOG: CPU: 14: transmit queue 0 timed out 5535 ms

What do these log lines mean?
Especially the last two (they were printed in red on screen)?

Another log from 7 boots ago shows the same line before the reboot:

nov 13 17:06:58 WaanzinsPC4 kernel: pcieport 0000:02:07.0: Unable to change power state from D3hot to D0, device inaccessible
nov 13 17:07:00 WaanzinsPC4 kernel: r8169 0000:06:00.0 enp6s0: NETDEV WATCHDOG: CPU: 14: transmit queue 0 timed out 5308 ms

Another log from 21 boots ago shows nothing network related, but this as the last line:

nov 03 20:40:44 WaanzinsPC4 kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1147011041 wd_nsec: 1147004716

Then, from 29 boots ago, these are the last lines in the log:

okt 28 17:10:38 WaanzinsPC4 nautilus[8861]: Attempted to add a non-existent file to the view.
okt 28 17:10:41 WaanzinsPC4 nautilus[8861]: nautilus_file_info_add_emblem: assertion 'emblem_name != NULL && emblem_name[0] != '\0'' failed
okt 28 17:10:42 WaanzinsPC4 nautilus[8861]: Attempted to add a non-existent file to the view.
okt 28 17:10:44 WaanzinsPC4 nautilus[8861]: nautilus_file_info_add_emblem: assertion 'emblem_name != NULL && emblem_name[0] != '\0'' failed

Edit: to clarify, the Duplicati backup job starts at 17:05 each day. There must be some kind of correlation to the reboots happening within those few minutes…? After the reboot, the system runs fsck on the HDD, then completes/reruns the backup job without any issues.

Edit: the motherboard is Asrock A620M-HDV/M.2+. The BIOS has been updated to the latest version available (3.10). The CPU is AMD Ryzen 7 8700G.

Thanks again,
FWieP

Hello all,

Finally, I have been able to reproduce the crash/reboot. This time, I ran a Duplicati backup job manually (from internal HDD to external HDD). The journalctl -b -1 shows nothing special:

nov 17 08:13:03 WaanzinsPC4 gnome-shell[3786]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed

I’m no looking into the Duplicati logs, but have found nothing special as of yet.

Yesterday, the system installed several updates through dnf, among them amd-gpu-firmware and amd-ucode-firmware. I thought, maybe that would fix this issue. But it did not.

How can a backup job hang and reboot the system?

Edit: now having a way to reproduce the reboot, I decided to view the system logs as they happen. In dmesg, I see the following error messages (apparently related to the internal HDD):

[ 1036.510345] ata1.00: sense data available but port frozen
[ 1036.510351] ata1.00: exception Emask 0x11 SAct 0x2000000 SErr 0x6c0100 action 0x6 frozen
[ 1036.510355] ata1.00: irq_stat 0x48000008, interface fatal error
[ 1036.510358] ata1: SError: { UnrecovData CommWake 10B8B BadCRC Handshk }
[ 1036.510362] ata1.00: failed command: READ FPDMA QUEUED
[ 1036.510364] ata1.00: cmd 60/00:c8:a0:d3:e9/01:00:2e:00:00/40 tag 25 ncq dma 131072 in
                        res 43/84:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 1036.510370] ata1.00: status: { DRDY SENSE ERR }
[ 1036.510373] ata1.00: error: { ICRC ABRT }
[ 1036.510376] ata1: hard resetting link
[ 1036.983643] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1036.987787] ata1.00: configured for UDMA/133
[ 1036.998273] sd 0:0:0:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 1036.998282] sd 0:0:0:0: [sda] tag#25 Sense Key : Aborted Command [current] 
[ 1036.998286] sd 0:0:0:0: [sda] tag#25 Add. Sense: Scsi parity error
[ 1036.998289] sd 0:0:0:0: [sda] tag#25 CDB: Read(16) 88 00 00 00 00 00 2e e9 d3 a0 00 00 01 00 00 00
[ 1036.998292] I/O error, dev sda, sector 787076000 op 0x0:(READ) flags 0x80700 phys_seg 31 prio class 2
[ 1036.998314] ata1: EH complete

[ 1119.110601] ata1.00: sense data available but port frozen
[ 1119.110611] ata1.00: exception Emask 0x11 SAct 0x4000 SErr 0x46d0100 action 0x6 frozen
[ 1119.110615] ata1.00: irq_stat 0x48000008, interface fatal error
[ 1119.110618] ata1: SError: { UnrecovData PHYRdyChg CommWake 10B8B BadCRC Handshk DevExch }
[ 1119.110623] ata1.00: failed command: READ FPDMA QUEUED
[ 1119.110625] ata1.00: cmd 60/00:70:00:a2:99/01:00:4a:01:00/40 tag 14 ncq dma 131072 in
                        res 43/84:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 1119.110633] ata1.00: status: { DRDY SENSE ERR }
[ 1119.110635] ata1.00: error: { ICRC ABRT }
[ 1119.110645] ata1: hard resetting link
[ 1119.575913] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1119.579832] ata1.00: configured for UDMA/133
[ 1119.590001] sd 0:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 1119.590009] sd 0:0:0:0: [sda] tag#14 Sense Key : Aborted Command [current] 
[ 1119.590013] sd 0:0:0:0: [sda] tag#14 Add. Sense: Scsi parity error
[ 1119.590017] sd 0:0:0:0: [sda] tag#14 CDB: Read(16) 88 00 00 00 00 01 4a 99 a2 00 00 00 01 00 00 00
[ 1119.590020] I/O error, dev sda, sector 5546549760 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[ 1119.590039] ata1: EH complete

What do these errors mean? Is the HDD physically damaged, or should I reformat it?

Thanks and kind regards,
FWieP

If you have a spare SATA (or IDE as I see UDMA 133 listed in the error?) cable or port on the motherboard to connect the drive to, I would see if changing either of those could be the issue. Also, installing and running smartctl from the smartmontools package on the drive should provide you with the health and stats reported from the drive itself.

It suggests there is a hardware issue with the harddrive, formatting will likely not fix it. It could be something as simple as a faulty cable, but there could also be a problem with the I/O controller or the cache memory on the drive.

If you have a spare cable that would be a simple thing to try first. If the problem persists it would be best to put the drive in another PC and do a full S.M.A.R.T. test as @theqlp suggested to check for any errors.

It would be advisable to not rely on this drive for any important data until you have tracked down the cause.

Since you have bought the system new from a store it might be good to talk with them first to avoid voiding any warranty.

Issues with a failure rate around once a week may not show up in a short test. A couple overnight runs is often recommended. Heat can be a factor in memory glitches. Systems with high-end GPU’s may run hotter in actual use than while memory tests are running.

For systems with ample memory you can try removing some memory. If you have multiple add-on cards or drives you can try stripping down to the bare minimum and then adding hardware components one-by-one.

Bad solder joints and connectors can be sensitive to vibration. You can try tapping the case or picking up the system and dropping it a couple cm.

Have just swapped the SATA cable with a brand new one, and used a different port on the motherboard side. A few minutes into the backup job, dmesg shows the same error as before:

[  311.481330] ata3.00: sense data available but port frozen
[  311.481340] ata3.00: exception Emask 0x11 SAct 0x10000 SErr 0x6c0100 action 0x6 frozen
[  311.481344] ata3.00: irq_stat 0x48000008, interface fatal error
[  311.481347] ata3: SError: { UnrecovData CommWake 10B8B BadCRC Handshk }
[  311.481351] ata3.00: failed command: READ FPDMA QUEUED
[  311.481353] ata3.00: cmd 60/00:80:e8:c5:30/01:00:30:00:00/40 tag 16 ncq dma 131072 in
                        res 43/84:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[  311.481360] ata3.00: status: { DRDY SENSE ERR }
[  311.481362] ata3.00: error: { ICRC ABRT }
[  311.481371] ata3: hard resetting link
[  311.955593] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  311.959495] ata3.00: configured for UDMA/133
[  311.969672] sd 2:0:0:0: [sda] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  311.969681] sd 2:0:0:0: [sda] tag#16 Sense Key : Aborted Command [current] 
[  311.969687] sd 2:0:0:0: [sda] tag#16 Add. Sense: Scsi parity error
[  311.969692] sd 2:0:0:0: [sda] tag#16 CDB: Read(16) 88 00 00 00 00 00 30 30 c5 e8 00 00 01 00 00 00
[  311.969695] I/O error, dev sda, sector 808502760 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[  311.969717] ata3: EH complete

This is the output of sudo smartctl -x /dev/sda:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.7-200.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VX016-3CV104
Serial Number:    WW625N7N
LU WWN Device Id: 5 000c50 0f2872f34
Firmware Version: CV10
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Nov 17 16:59:46 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 457) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x70bd)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   082   064   006    -    141584184
  3 Spin_Up_Time            PO----   096   095   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    74
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   070   060   045    -    11015804
  9 Power_On_Hours          -O--CK   100   100   000    -    724
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    67
183 Runtime_Bad_Block       -O--CK   096   096   000    -    4
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   098   000    -    42950393879
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   068   061   040    -    32 (Min/Max 18/32)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    8
193 Load_Cycle_Count        -O--CK   100   100   000    -    186
194 Temperature_Celsius     -O---K   032   040   000    -    32 (0 18 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   082   064   000    -    141584184
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    68
240 Head_Flying_Hours       ------   100   253   000    -    662 (209 111 0)
241 Total_LBAs_Written      ------   100   253   000    -    19250142579
242 Total_LBAs_Read         ------   100   253   000    -    91161825597
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    512  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      24  Device vendor specific log
0xa2       GPL     VS    8160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    9048  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      16  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS      24  Device vendor specific log
0xd1       GPL     VS     264  Device vendor specific log
0xd3       GPL     VS    1920  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    32 Celsius
Power Cycle Min/Max Temperature:     18/32 Celsius
Lifetime    Min/Max Temperature:     18/39 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        94 minutes
Min/Max recommended Temperature:      1/61 Celsius
Min/Max Temperature Limit:            2/60 Celsius
Temperature History Size (Index):    128 (70)

Index    Estimated Time   Temperature Celsius
  71    2024-11-09 09:40    33  **************
  72    2024-11-09 11:14     ?  -
  73    2024-11-09 12:48    20  *
  74    2024-11-09 14:22     ?  -
  75    2024-11-09 15:56    28  *********
  76    2024-11-09 17:30     ?  -
  77    2024-11-09 19:04    20  *
  78    2024-11-09 20:38     ?  -
  79    2024-11-09 22:12    19  -
  80    2024-11-09 23:46    34  ***************
  81    2024-11-10 01:20    35  ****************
 ...    ..(  2 skipped).    ..  ****************
  84    2024-11-10 06:02    35  ****************
  85    2024-11-10 07:36    34  ***************
  86    2024-11-10 09:10    34  ***************
  87    2024-11-10 10:44     ?  -
  88    2024-11-10 12:18    19  -
  89    2024-11-10 13:52    33  **************
  90    2024-11-10 15:26    34  ***************
  91    2024-11-10 17:00    33  **************
  92    2024-11-10 18:34    33  **************
  93    2024-11-10 20:08    34  ***************
  94    2024-11-10 21:42    33  **************
  95    2024-11-10 23:16    33  **************
  96    2024-11-11 00:50    33  **************
  97    2024-11-11 02:24     ?  -
  98    2024-11-11 03:58    20  *
  99    2024-11-11 05:32    33  **************
 100    2024-11-11 07:06    34  ***************
 ...    ..(  3 skipped).    ..  ***************
 104    2024-11-11 13:22    34  ***************
 105    2024-11-11 14:56    36  *****************
 106    2024-11-11 16:30    34  ***************
 107    2024-11-11 18:04    33  **************
 108    2024-11-11 19:38     ?  -
 109    2024-11-11 21:12    20  *
 110    2024-11-11 22:46    34  ***************
 111    2024-11-12 00:20    34  ***************
 112    2024-11-12 01:54    34  ***************
 113    2024-11-12 03:28    33  **************
 114    2024-11-12 05:02    33  **************
 115    2024-11-12 06:36     ?  -
 116    2024-11-12 08:10    20  *
 117    2024-11-12 09:44     ?  -
 118    2024-11-12 11:18    19  -
 119    2024-11-12 12:52     ?  -
 120    2024-11-12 14:26    20  *
 121    2024-11-12 16:00     ?  -
 122    2024-11-12 17:34    19  -
 123    2024-11-12 19:08    32  *************
 124    2024-11-12 20:42    32  *************
 125    2024-11-12 22:16    33  **************
 126    2024-11-12 23:50    32  *************
 127    2024-11-13 01:24    32  *************
   0    2024-11-13 02:58    32  *************
   1    2024-11-13 04:32    31  ************
   2    2024-11-13 06:06    31  ************
   3    2024-11-13 07:40     ?  -
   4    2024-11-13 09:14    19  -
   5    2024-11-13 10:48     ?  -
   6    2024-11-13 12:22    19  -
   7    2024-11-13 13:56     ?  -
   8    2024-11-13 15:30    19  -
   9    2024-11-13 17:04     ?  -
  10    2024-11-13 18:38    19  -
  11    2024-11-13 20:12     ?  -
  12    2024-11-13 21:46    19  -
  13    2024-11-13 23:20     ?  -
  14    2024-11-14 00:54    20  *
  15    2024-11-14 02:28     ?  -
  16    2024-11-14 04:02    19  -
  17    2024-11-14 05:36     ?  -
  18    2024-11-14 07:10    19  -
  19    2024-11-14 08:44     ?  -
  20    2024-11-14 10:18    20  *
  21    2024-11-14 11:52     ?  -
  22    2024-11-14 13:26    19  -
  23    2024-11-14 15:00     ?  -
  24    2024-11-14 16:34    19  -
  25    2024-11-14 18:08     ?  -
  26    2024-11-14 19:42    18  -
  27    2024-11-14 21:16     ?  -
  28    2024-11-14 22:50    18  -
  29    2024-11-15 00:24     ?  -
  30    2024-11-15 01:58    18  -
  31    2024-11-15 03:32     ?  -
  32    2024-11-15 05:06    26  *******
  33    2024-11-15 06:40     ?  -
  34    2024-11-15 08:14    27  ********
  35    2024-11-15 09:48     ?  -
  36    2024-11-15 11:22    30  ***********
  37    2024-11-15 12:56     ?  -
  38    2024-11-15 14:30    19  -
  39    2024-11-15 16:04     ?  -
  40    2024-11-15 17:38    19  -
  41    2024-11-15 19:12     ?  -
  42    2024-11-15 20:46    18  -
  43    2024-11-15 22:20     ?  -
  44    2024-11-15 23:54    18  -
  45    2024-11-16 01:28     ?  -
  46    2024-11-16 03:02    19  -
  47    2024-11-16 04:36     ?  -
  48    2024-11-16 06:10    18  -
  49    2024-11-16 07:44     ?  -
  50    2024-11-16 09:18    18  -
  51    2024-11-16 10:52     ?  -
  52    2024-11-16 12:26    18  -
  53    2024-11-16 14:00     ?  -
  54    2024-11-16 15:34    19  -
  55    2024-11-16 17:08     ?  -
  56    2024-11-16 18:42    32  *************
  57    2024-11-16 20:16     ?  -
  58    2024-11-16 21:50    30  ***********
  59    2024-11-16 23:24     ?  -
  60    2024-11-17 00:58    19  -
  61    2024-11-17 02:32     ?  -
  62    2024-11-17 04:06    18  -
  63    2024-11-17 05:40     ?  -
  64    2024-11-17 07:14    18  -
  65    2024-11-17 08:48    32  *************
  66    2024-11-17 10:22    32  *************
  67    2024-11-17 11:56    31  ************
  68    2024-11-17 13:30    31  ************
  69    2024-11-17 15:04    32  *************
  70    2024-11-17 16:38    32  *************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              67  ---  Lifetime Power-On Resets
0x01  0x010  4             724  ---  Power-on Hours
0x01  0x018  6     19250454123  ---  Logical Sectors Written
0x01  0x020  6        69159153  ---  Number of Write Commands
0x01  0x028  6     91161852212  ---  Logical Sectors Read
0x01  0x030  6        46542447  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4             714  ---  Spindle Motor Power-on Hours
0x03  0x010  4             662  ---  Head Flying Hours
0x03  0x018  4             186  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4               8  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4              23  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              32  ---  Current Temperature
0x05  0x010  1              30  ---  Average Short Term Temperature
0x05  0x018  1              31  ---  Average Long Term Temperature
0x05  0x020  1              39  ---  Highest Temperature
0x05  0x028  1              19  ---  Lowest Temperature
0x05  0x030  1              34  ---  Highest Average Short Term Temperature
0x05  0x038  1              30  ---  Lowest Average Short Term Temperature
0x05  0x040  1              31  ---  Highest Average Long Term Temperature
0x05  0x048  1              30  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4             216  ---  Number of Hardware Resets
0x06  0x010  4             120  ---  Number of ASR Events
0x06  0x018  4              68  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x0001  2            3  Command failed due to ICRC error
0x0003  2            3  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            1  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

Seagate FARM log (GP Log 0xa6) supported [try: -l farm]

Update: Having read up on the use of smartctl, I ran several short tests; no issues. Running a conveyance SMART test gets me a “Connection timed out” every time. See this output after three tries:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.7-200.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance captive  Interrupted (host reset)      50%       725         -
# 2  Conveyance captive  Interrupted (host reset)      50%       725         -
# 3  Conveyance captive  Interrupted (host reset)      50%       725         -
# 4  Short captive       Completed without error       00%       725         -
# 5  Short offline       Completed without error       00%       725         -

Meanwhile in dmesg and journalctl, I get these two lines:

[  185.762588] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  185.800229] ata2.00: configured for UDMA/133

Yes, I’ve switched SATA ports again, but that should not be a problem.

Can the issue be seen in this output? Can I use this in my communication with the computer store’s people? What should I tell or show them?

Thanks,
FWieP

Yes you can show the SMART output to the computer store. It shows a couple of things that indcate a potential hardware problem concerning the communication with the drive. The most important one is probably:

UDMA_CRC_Error_Count 68

These are probably the errors you saw in your dmesg output when the last few reboots happened:

[ 1119.110615] ata1.00: irq_stat 0x48000008, interface fatal error
[ 1119.110618] ata1: SError: { UnrecovData PHYRdyChg CommWake 10B8B BadCRC Handshk DevExch }
[ 1119.110623] ata1.00: failed command: READ FPDMA QUEUED

The disk seems fine in terms of data integrity, but the physical interface to the disk appears to have a problem. The store could perhaps look into whether the problem lies with the drive or with the motherbord (bios / SATA controller).

2 Likes

Start with a simple solution. Replace sata cable and see if the problem persists.

@caesar : that was one of the first things I tried. A brand new SATA cable, as with several other SATA ports on the motherboard. The issue persists.

@litemotiv : I have contacted the store and they are willing to “look for a solution” as soon as I bring the PC to them. I will make a paper hard-copy of the SMART-report and deliver that, too.

For now, I’m reinstalling my old desktop PC with Fedora 41 and LUKS. I’m glad I still have it available as a backup system.

To be continued; I’ll report back when I have more news.

Kind regards,
FWieP

2 Likes

I am afraid the disk has problems. I have almost never seen bad reports from smart, yet the disk was gone for good every time.
However, I also recall some issues with NCQ, see this for instance:

1 Like

Hello all,

The store’s people have swapped the HDD for a brand new one, under warranty, free of charge. The defective drive will be sent to their supplier, along with my hard-copy SMART-report.

They were surprised that such a drive could even be defective; they have sold about a thousand of them without any troubles. Now comes this Linux-user with a supposedly defective drive… Oh well, I am convinced we all did our best to analyze this issue and provide clear evidence pointing to a very probable cause.

I will report back after four weeks of day-to-day usage without any unexpected restarts.

Thank you all, kind regards,
FWieP

1 Like

Good to hear, hopefully your new drive will work without problems!

Wait… what? You have a Seagate disk for video surveillance, not for desktop usage: good writes, slow reads. Seagate Surveillance ST4000VX000 4TB HDD Review
It is not even a nas unit. Look how old the review is. I doubt very much they sold 1k drives without returns, statistically failures are of the order of few percent.

Hello all,

My desktop-PC has not rebooted unexpectedly in the past two weeks. But… today I heard an unusual sound when the internal HDD was put to work. I immediately checked the system logs and this is what I found:

[ 9816.939758] ata1.00: exception Emask 0x0 SAct 0x4008000 SErr 0x8d0000 action 0x6 frozen
[ 9816.939767] ata1: SError: { PHYRdyChg CommWake 10B8B LinkSeq }
[ 9816.939771] ata1.00: failed command: READ FPDMA QUEUED
[ 9816.939774] ata1.00: cmd 60/00:78:00:89:81/01:00:07:00:00/40 tag 15 ncq dma 131072 in
                        res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 9816.939781] ata1.00: status: { DRDY }
[ 9816.939783] ata1.00: failed command: READ FPDMA QUEUED
[ 9816.939785] ata1.00: cmd 60/00:d0:00:88:81/01:00:07:00:00/40 tag 26 ncq dma 131072 in
                        res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 9816.939790] ata1.00: status: { DRDY }
[ 9816.939799] ata1: hard resetting link

[ 9822.293677] ata1: link is slow to respond, please be patient (ready=0)

[ 9826.974518] ata1: found unknown device (class 0)
[ 9827.134732] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9827.138944] ata1.00: configured for UDMA/133
[ 9827.149642] sd 0:0:0:0: [sda] tag#15 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=40s
[ 9827.149651] sd 0:0:0:0: [sda] tag#15 CDB: Read(16) 88 00 00 00 00 00 07 81 89 00 00 00 01 00 00 00
[ 9827.149654] I/O error, dev sda, sector 125929728 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 2
[ 9827.149680] sd 0:0:0:0: [sda] tag#26 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=40s
[ 9827.149683] sd 0:0:0:0: [sda] tag#26 CDB: Read(16) 88 00 00 00 00 00 07 81 88 00 00 00 01 00 00 00
[ 9827.149685] I/O error, dev sda, sector 125929472 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 2
[ 9827.149697] ata1: EH complete

[ 9931.122508] ata1.00: exception Emask 0x11 SAct 0x2fff0 SErr 0xec0101 action 0x6 frozen
[ 9931.122517] ata1.00: irq_stat 0x4c000001, interface fatal error
[ 9931.122520] ata1: SError: { RecovData UnrecovData CommWake 10B8B BadCRC Handshk LinkSeq }
[ 9931.122525] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122527] ata1.00: cmd 61/08:20:20:83:c0/00:00:d2:00:00/40 tag 4 ncq dma 4096 out
                        res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122533] ata1.00: status: { DRDY }
[ 9931.122536] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122538] ata1.00: cmd 61/18:28:90:8a:c0/00:00:d2:00:00/40 tag 5 ncq dma 12288 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 9931.122543] ata1.00: status: { DRDY }
[ 9931.122545] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122547] ata1.00: cmd 61/38:30:10:9c:c0/00:00:d2:00:00/40 tag 6 ncq dma 28672 out
                        res 40/00:ff:81:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122552] ata1.00: status: { DRDY }
[ 9931.122554] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122556] ata1.00: cmd 61/10:38:38:9f:c0/00:00:d2:00:00/40 tag 7 ncq dma 8192 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 9931.122561] ata1.00: status: { DRDY }
[ 9931.122563] ata1.00: failed command: READ FPDMA QUEUED
[ 9931.122564] ata1.00: cmd 60/68:40:c8:6c:58/00:00:02:00:00/40 tag 8 ncq dma 53248 in
                        res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 9931.122569] ata1.00: status: { DRDY }
[ 9931.122571] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122572] ata1.00: cmd 61/08:48:a8:81:00/00:00:d3:00:00/40 tag 9 ncq dma 4096 out
                        res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122577] ata1.00: status: { DRDY }
[ 9931.122579] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122581] ata1.00: cmd 61/08:50:f8:81:00/00:00:d3:00:00/40 tag 10 ncq dma 4096 out
                        res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122586] ata1.00: status: { DRDY }
[ 9931.122588] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122590] ata1.00: cmd 61/08:58:f0:83:00/00:00:d3:00:00/40 tag 11 ncq dma 4096 out
                        res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 9931.122594] ata1.00: status: { DRDY }
[ 9931.122596] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122598] ata1.00: cmd 61/08:60:f8:84:00/00:00:d3:00:00/40 tag 12 ncq dma 4096 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 9931.122603] ata1.00: status: { DRDY }
[ 9931.122605] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122606] ata1.00: cmd 61/08:68:b8:84:00/00:00:d3:00:00/40 tag 13 ncq dma 4096 out
                        res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122611] ata1.00: status: { DRDY }
[ 9931.122613] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122615] ata1.00: cmd 61/08:70:a8:85:00/00:00:d3:00:00/40 tag 14 ncq dma 4096 out
                        res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122619] ata1.00: status: { DRDY }
[ 9931.122621] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122623] ata1.00: cmd 61/08:78:28:87:00/00:00:d3:00:00/40 tag 15 ncq dma 4096 out
                        res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 9931.122628] ata1.00: status: { DRDY }
[ 9931.122630] ata1.00: failed command: WRITE FPDMA QUEUED
[ 9931.122631] ata1.00: cmd 61/08:88:d0:87:00/00:00:d3:00:00/40 tag 17 ncq dma 4096 out
                        res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[ 9931.122636] ata1.00: status: { DRDY }
[ 9931.122649] ata1: hard resetting link
[ 9931.590747] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9931.594990] ata1.00: configured for UDMA/133
[ 9931.605239] sd 0:0:0:0: [sda] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=5s
[ 9931.605249] sd 0:0:0:0: [sda] tag#8 Sense Key : Illegal Request [current] 
[ 9931.605252] sd 0:0:0:0: [sda] tag#8 Add. Sense: Unaligned write command
[ 9931.605256] sd 0:0:0:0: [sda] tag#8 CDB: Read(16) 88 00 00 00 00 00 02 58 6c c8 00 00 00 68 00 00
[ 9931.605258] I/O error, dev sda, sector 39349448 op 0x0:(READ) flags 0x80700 phys_seg 7 prio class 2
[ 9931.605284] ata1: EH complete

[10081.528793] ata1.00: sense data available but port frozen
[10081.528800] ata1.00: exception Emask 0x11 SAct 0x40 SErr 0xec0101 action 0x6 frozen
[10081.528804] ata1.00: irq_stat 0x4c000008, interface fatal error
[10081.528806] ata1: SError: { RecovData UnrecovData CommWake 10B8B BadCRC Handshk LinkSeq }
[10081.528810] ata1.00: failed command: READ FPDMA QUEUED
[10081.528812] ata1.00: cmd 60/00:30:f0:23:84/01:00:02:00:00/40 tag 6 ncq dma 131072 in
                        res 43/84:ff:81:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[10081.528819] ata1.00: status: { DRDY SENSE ERR }
[10081.528821] ata1.00: error: { ICRC ABRT }
[10081.528825] ata1: hard resetting link
[10081.999172] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[10082.004093] ata1.00: configured for UDMA/133
[10082.014393] sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[10082.014402] sd 0:0:0:0: [sda] tag#6 Sense Key : Aborted Command [current] 
[10082.014406] sd 0:0:0:0: [sda] tag#6 Add. Sense: Scsi parity error
[10082.014410] sd 0:0:0:0: [sda] tag#6 CDB: Read(16) 88 00 00 00 00 00 02 84 23 f0 00 00 01 00 00 00
[10082.014412] I/O error, dev sda, sector 42214384 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 2
[10082.014436] ata1: EH complete

[10318.625675] ata1.00: sense data available but port frozen
[10318.625683] ata1: limiting SATA link speed to 3.0 Gbps
[10318.625686] ata1.00: exception Emask 0x11 SAct 0x8000 SErr 0x6c0100 action 0x6 frozen
[10318.625689] ata1.00: irq_stat 0x48000008, interface fatal error
[10318.625691] ata1: SError: { UnrecovData CommWake 10B8B BadCRC Handshk }
[10318.625695] ata1.00: failed command: READ FPDMA QUEUED
[10318.625697] ata1.00: cmd 60/00:78:48:07:ea/01:00:01:00:00/40 tag 15 ncq dma 131072 in
                        res 43/84:ff:ff:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[10318.625704] ata1.00: status: { DRDY SENSE ERR }
[10318.625706] ata1.00: error: { ICRC ABRT }
[10318.625710] ata1: hard resetting link
[10319.086713] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[10319.090693] ata1.00: configured for UDMA/133
[10319.100896] sd 0:0:0:0: [sda] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[10319.100903] sd 0:0:0:0: [sda] tag#15 Sense Key : Aborted Command [current] 
[10319.100907] sd 0:0:0:0: [sda] tag#15 Add. Sense: Scsi parity error
[10319.100910] sd 0:0:0:0: [sda] tag#15 CDB: Read(16) 88 00 00 00 00 00 01 ea 07 48 00 00 01 00 00 00
[10319.100912] I/O error, dev sda, sector 32114504 op 0x0:(READ) flags 0x80700 phys_seg 11 prio class 2
[10319.100930] ata1: EH complete

This is the SMART-report of the brand new HDD (from the same product line, once again a disk optimized for storing surveillance footage).

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.8-200.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VX016-3CV104
Serial Number:    WW626R9H
LU WWN Device Id: 5 000c50 0f289fee5
Firmware Version: CV10
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Dec  1 17:14:51 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 463) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x70bd)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   073   065   006    -    21666080
  3 Spin_Up_Time            PO----   096   096   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    7
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   063   060   045    -    1952263
  9 Power_On_Hours          -O--CK   100   100   000    -    182
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    7
183 Runtime_Bad_Block       -O--CK   099   099   000    -    1
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   099   000    -    8590196740
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   069   064   040    -    31 (Min/Max 17/36)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    1
193 Load_Cycle_Count        -O--CK   100   100   000    -    81
194 Temperature_Celsius     -O---K   031   040   000    -    31 (0 17 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   073   065   000    -    21666080
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    7
240 Head_Flying_Hours       ------   100   253   000    -    171 (27 9 0)
241 Total_LBAs_Written      ------   100   253   000    -    5512657984
242 Total_LBAs_Read         ------   100   253   000    -    9019584243
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    512  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      24  Device vendor specific log
0xa2       GPL     VS    8160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    9048  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      16  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS      24  Device vendor specific log
0xd1       GPL     VS     264  Device vendor specific log
0xd3       GPL     VS    1920  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     17/36 Celsius
Lifetime    Min/Max Temperature:     17/36 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        94 minutes
Min/Max recommended Temperature:      1/61 Celsius
Min/Max Temperature Limit:            2/60 Celsius
Temperature History Size (Index):    128 (26)

Index    Estimated Time   Temperature Celsius
  27    2024-11-23 08:56     ?  -
 ...    ..( 99 skipped).    ..  -
 127    2024-11-29 21:36     ?  -
   0    2024-11-29 23:10    27  ********
   1    2024-11-30 00:44     ?  -
   2    2024-11-30 02:18    17  -
   3    2024-11-30 03:52    34  ***************
   4    2024-11-30 05:26    35  ****************
   5    2024-11-30 07:00    35  ****************
   6    2024-11-30 08:34    35  ****************
   7    2024-11-30 10:08    34  ***************
   8    2024-11-30 11:42    34  ***************
   9    2024-11-30 13:16     ?  -
  10    2024-11-30 14:50    28  *********
  11    2024-11-30 16:24     ?  -
  12    2024-11-30 17:58    17  -
  13    2024-11-30 19:32     ?  -
  14    2024-11-30 21:06    18  -
  15    2024-11-30 22:40     ?  -
  16    2024-12-01 00:14    17  -
  17    2024-12-01 01:48     ?  -
  18    2024-12-01 03:22    17  -
  19    2024-12-01 04:56     ?  -
  20    2024-12-01 06:30    17  -
  21    2024-12-01 08:04    31  ************
  22    2024-12-01 09:38    34  ***************
  23    2024-12-01 11:12    31  ************
  24    2024-12-01 12:46    30  ***********
  25    2024-12-01 14:20    31  ************
  26    2024-12-01 15:54    31  ************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               7  ---  Lifetime Power-On Resets
0x01  0x010  4             182  ---  Power-on Hours
0x01  0x018  6      5512657984  ---  Logical Sectors Written
0x01  0x020  6         2922013  ---  Number of Write Commands
0x01  0x028  6      9019584243  ---  Logical Sectors Read
0x01  0x030  6         1890387  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4             182  ---  Spindle Motor Power-on Hours
0x03  0x010  4             171  ---  Head Flying Hours
0x03  0x018  4              81  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4               1  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               4  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              31  ---  Current Temperature
0x05  0x010  1              30  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              36  ---  Highest Temperature
0x05  0x028  1              22  ---  Lowest Temperature
0x05  0x030  1              31  ---  Highest Average Short Term Temperature
0x05  0x038  1              29  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              20  ---  Number of Hardware Resets
0x06  0x010  4              10  ---  Number of ASR Events
0x06  0x018  4               7  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x0001  2            3  Command failed due to ICRC error
0x0003  2            3  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            2  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

Seagate FARM log (GP Log 0xa6) supported [try: -l farm]

Does this mean, my system 'eats` surveillance-HDDs for breakfast until they are defective and hang the system while dying? Should I ask the store for a regular desktop HDD with CMR and 4TB capacity?

I feel bad about bothering them again; having nothing but the SMART-reports and the kind of HDD to back my claims. But I paid for a complete and stable running system, so…

Thanks for your thoughts, kind regards,
FWieP

Oof, that’s not great to hear that the drive isn’t happy. For general purpose storage, I would definitely recommend sticking with standard CMR drives (anything SMR should be avoided except for warm backups that aren’t online most of the time).

1 Like