HOWTO Repair A Failing ZFS Mirror Disc

One of my systems recently marked a ZFS pool as DEGRADED because one of the mirror discs had too many errors.

I suspect what happened was it failed during a routine scrub over the weekend and then the system attempted to resilver the mirror after I rebooted for other reasons. It looked like this:

  pool: rpool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
	repaired.
  scan: resilvered 276G in 0 days 15:39:02 with 0 errors on Mon Dec 14 22:55:58 2020
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    sda2    FAULTED      0    12   211  too many errors
	    sdb2    ONLINE       0     0     0
	cache
	  sde       ONLINE       0     0     0

This pool is configured as a simple RAID-1 mirror of two Western Digital RED drives (model WD40EFRX-68WT0N0) with a small 128GB Micron SSD acting as a read cache. This configuration just proved itself by keeping the system running when one of the discs failed.

The disc itself has been powered on for over 42300 hours, which at 4.8(-ish) years is nearly the expected useful life of 5 years powered on. (These drives have a 5 year warranty, so I’ll have to take that up with wherever I bought these things from 5 years ago). I’ve ordered some replacement drives, but while I wait for them to show up, I thought I’d try to repair the errors, if possible.

These discs are S.M.A.R.T. enabled so they can run some self-tests that will help us figure out where the problems are.

smartctl -t short /dev/sda

will run a short self-test, which stops at the first error. After a couple of minutes, we can run

smartctl --log=selftest /dev/sda

to see the the results. They look like this:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       10%     42389         116522400
# 2  Short offline       Completed: read failure       10%     42390         116522400

I found this page about handling bad blocks detected by smartmontools quite helpful in explaining some of the steps you can take, but I found I could skip most of the maths related to calculating block offsets for filesystems and just use hdparm to talk to the disc directly.

Dangerous Repair Instructions

DANGER! Before we go any further, be aware that these hints here include some very dangerous and destructive commands. Do not try this on a system you care about and don’t have known-good and working backups for.

If you’re feeling brave and/or foolish, read on!

In the smartctl self-test results above, we found the Logical Block Address (LBA) of the first error was at 116522400. This actually corresponds to the sector number on the disc as well, in my case, which means can verify it using hdparm like this:

hdparm --read-sector 116522400 /dev/sda

Which gives us output like this:

/dev/sda:
reading sector 116522401: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e6 01 11 04 00 00 00 a1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
succeeded
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
...

This confirms a bad read. A good read looks like this:

/dev/sda:
reading sector 3302449: succeeded
7433 3264 3574 664e 6c47 4d73 584c 4c46
364e 7666 5453 5365 6e50 6f63 7078 6363
7357 0a36 3733 6370 316e 6d54 6c5a 6e46
3372 7572 4766 346b 3538 7245 784c 6f39
3631 7264 3763 3958 6b78 7938 4764 5a6e
5565 354e 3838 5257 6954 6336 747a 4675
3253 4d62 3672 706a 6752 6c30 7062 740a
...

Forcing Sector Reallocation

We can try to force the disc to reallocate this sector by writing to it with hdparm. Hard discs controllers have all sorts of fancy computers and software in them to make the physical hardware of discs pretend they are vastly more stable and functional than they really are. They hide all kinds of errors from us most of the time, so we can try to take advantage of their helpful nature.

The command we want is

hdparm --write-sector 116522400 /dev/sda

which gives us this:

/dev/sda:
Use of --write-sector is VERY DANGEROUS.
You are trying to deliberately overwrite a low-level sector on the media.
This is a BAD idea, and can easily result in total data loss.
Please supply the --yes-i-know-what-i-am-doing flag if you really want this.
Program aborted.

I told you this was dangerous.

Because we are totally YOLO-ing this we will lie to the computer that we know what we’re doing.

hdparm --yes-i-know-what-i-am-doing --write-sector 116522400 /dev/sda

Which gives us this result:

/dev/sda:
re-writing sector 116522400: succeeded

Hooray!

Let’s check this worked correctly by reading the sector again:

hdparm --read-sector 116522400 /dev/sda
/dev/sda:
reading sector 116522400: succeeded
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
...

Success!

We can now repeat the self-test check to see if there are any other bad sectors. Turns out, yes. #sadface

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       10%     42390         116522401
# 2  Short offline       Completed: read failure       10%     42390         116522400
# 3  Short offline       Completed: read failure       10%     42389         116522400

Breaking Things At Scale

Rather than waiting 2 minutes for each self-test, I figured I’d just check each sector nearby could be read with hdparm, and fix the ones with errors.

And because I like to let computers help me make mistakes at scale, I wrote a short bash script to automate the process:

#!/bin/bash
#
DISC="/dev/sda"
START_SECTOR="116522400"
MAX_SECTORS=10
END_SECTOR=$((${START_SECTOR}+${MAX_SECTORS}))

echo "Checking sectors ${START_SECTOR} to ${END_SECTOR}"

for i in $(seq ${START_SECTOR} 1 ${END_SECTOR}); do
  echo "Checking sector $i"
  result=`hdparm --read-sector $i ${DISC} 2>&1 | grep "reading sector"`
  echo "Got result: ${result}"
  echo $result | grep -q "bad/missing sense data"
  if [ $? -eq 0 ]; then
    echo "Bad sector found. Attempting to correct..."
                hdparm --yes-i-know-what-i-am-doing --write-sector $i ${DISC}
  else
    echo "Sector seems okay."
  fi

done

There were about 6 bad sectors in this region to reallocate, and then another couple in another region, but after a couple of iterations of this procedure, I was able to get the self-test to pass.

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     42391         -
# 2  Short offline       Completed: read failure       10%     42390         116526024
# 3  Short offline       Completed: read failure       10%     42390         116522401
# 4  Short offline       Completed: read failure       10%     42390         116522400
# 5  Short offline       Completed: read failure       10%     42389         116522400

Now we can mark the disc as repaired and the pool can resilver:

zpool clear rpool

Now I just need to wait for the pool to resilver over the next 18 hours or so

  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Dec 15 08:12:08 2020
	856G scanned at 296M/s, 148G issued at 51.1M/s, 3.27T total
	118G resilvered, 4.40% done, 0 days 17:49:53 to go
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    sda2    ONLINE       0     0     0  (resilvering)
	    sdb2    ONLINE       0     0     0
	cache
	  sde       ONLINE       0     0     0

errors: No known data errors

Update: Success!

  pool: rpool
 state: ONLINE
  scan: resilvered 3.21T in 0 days 14:25:27 with 0 errors on Tue Dec 15 22:37:35 2020
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    sda2    ONLINE       0     0     0
	    sdb2    ONLINE       0     0     0
	cache
	  sde       ONLINE       0     0     0

errors: No known data errors

Happy disc repairing!

Bookmark the permalink.

Comments are closed.