Initially I thought adding a pair of Samsung 860 EVO as software RAID to my small home server cannot be too hard...
... but they seem to require special kernel settings to work.
The initial steps have been easy as expected:
- Create RAID partitions on both disks using fdisk with type 'fd'.
- Setup the RAID using:
mdadm --create /dev/md/1 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdd1
But as soon as I tried to format the new partition using mkfs.ext4 /dev/md1
errors appeared in the syslog:
Dec 29 19:11:56 tititea kernel: [ 145.887763] ata3.00: qc timeout (cmd 0x47) Dec 29 19:11:56 tititea kernel: [ 145.888851] ata3: failed to read log page 10h (errno=-5) Dec 29 19:11:56 tititea kernel: [ 145.888879] ata3.00: failed command: SEND FPDMA QUEUED Dec 29 19:11:56 tititea kernel: [ 146.203715] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Dec 29 19:11:56 tititea kernel: [ 146.204258] ata3.00: supports DRM functions and may not be fully accessible Dec 29 19:11:56 tititea kernel: [ 146.206678] ata3.00: supports DRM functions and may not be fully accessible Dec 29 19:11:56 tititea kernel: [ 146.209155] ata3.00: configured for UDMA/133 Dec 29 19:11:56 tititea kernel: [ 146.209166] ata3.00: device reported invalid CHS sector 0 Dec 29 19:11:56 tititea kernel: [ 146.209184] sd 2:0:0:0: [sdb] tag#28 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 29 19:11:56 tititea kernel: [ 146.209188] sd 2:0:0:0: [sdb] tag#28 Sense Key : Illegal Request [current] Dec 29 19:11:56 tititea kernel: [ 146.209191] sd 2:0:0:0: [sdb] tag#28 Add. Sense: Unaligned write command Dec 29 19:11:56 tititea kernel: [ 146.209194] sd 2:0:0:0: [sdb] tag#28 CDB: Write same(16) 93 08 00 00 00 00 00 04 10 00 00 00 80 00 00 00 Dec 29 19:11:56 tititea kernel: [ 146.209196] print_req_error: I/O error, dev sdb, sector 266240 Dec 29 19:11:56 tititea kernel: [ 146.209269] ata3: EH complete Dec 29 19:11:56 tititea kernel: [ 146.209554] ata3.00: Enabling discard_zeroes_data Dec 29 19:11:56 tititea kernel: [ 146.320563] ata3.00: irq_stat 0x40000008 Dec 29 19:11:56 tititea kernel: [ 146.320592] ata3.00: error: { IDNF } Dec 29 19:11:56 tititea kernel: [ 146.325264] ata3.00: configured for UDMA/133 Dec 29 19:11:56 tititea kernel: [ 146.325287] sd 2:0:0:0: [sdb] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 29 19:11:56 tititea kernel: [ 146.325291] sd 2:0:0:0: [sdb] tag#22 Sense Key : Illegal Request [current] Dec 29 19:11:56 tititea kernel: [ 146.325294] sd 2:0:0:0: [sdb] tag#22 Add. Sense: Logical block address out of range Dec 29 19:11:56 tititea kernel: [ 146.325299] sd 2:0:0:0: [sdb] tag#22 CDB: Write same(16) 93 08 00 00 00 00 00 04 90 00 00 01 80 00 00 00 Dec 29 19:11:56 tititea kernel: [ 146.325301] print_req_error: I/O error, dev sdb, sector 299008 Dec 29 19:11:56 tititea kernel: [ 146.325348] ata3: EH complete Dec 29 19:11:56 tititea kernel: [ 146.325540] ata3.00: Enabling discard_zeroes_data Dec 29 19:11:56 tititea kernel: [ 146.424228] ata3.00: exception Emask 0x0 SAct 0x18000000 SErr 0x0 action 0x0 Dec 29 19:11:56 tititea kernel: [ 146.424241] ata3.00: failed command: SEND FPDMA QUEUED Dec 29 19:11:56 tititea kernel: [ 146.424253] ata3.00: status: { DRDY ERR } Dec 29 19:11:56 tititea kernel: [ 146.424705] ata3.00: supports DRM functions and may not be fully accessible Dec 29 19:11:56 tititea kernel: [ 146.428992] ata3.00: configured for UDMA/133 Dec 29 19:11:56 tititea kernel: [ 146.429014] sd 2:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 29 19:11:56 tititea kernel: [ 146.429017] sd 2:0:0:0: [sdb] tag#27 Sense Key : Illegal Request [current] Dec 29 19:11:56 tititea kernel: [ 146.429019] sd 2:0:0:0: [sdb] tag#27 Add. Sense: Logical block address out of range Dec 29 19:11:56 tititea kernel: [ 146.429023] sd 2:0:0:0: [sdb] tag#27 CDB: Write same(16) 93 08 00 00 00 00 00 44 8f c0 00 00 00 40 00 00 Dec 29 19:11:56 tititea kernel: [ 146.429024] print_req_error: I/O error, dev sdb, sector 4493248 Dec 29 19:11:56 tititea kernel: [ 146.429067] ata3: EH complete
So I started investigating:
- The issue occured on both disks if I created a RAID 1 with one available and one missing disk.
- The errors did not occur when I formatted the disks directly (no RAID).
- According to the SMART status & selfcheck there is no error on the disks.
Later on I found some useful hints:
- Samsung firmware lies about supporting queued TRIM, and the drives stop responding while being trimmed. As a workaround, boot with
libata.force=noncq
, trim, then remove the kernel option. - Kernel Bug 201693 - Samsung 860 EVO NCQ Issue with AMD SATA Controller
I initially tried the option with the libata.force=noncq
boot parameter which worked as expected but it also disables SATA native command queueing for spinning disks.
Therefore I switched to the other workaound with setting the queue depth to 1 on the affected SSD drives using sysfsutils in /etc/sysfs.d/samsung-ssd-noncq.conf
:
block/sdb/device/queue_depth = 1 block/sdd/device/queue_depth = 1