ZFS 降级事故

Thu Mar 26 2026

记录一次 ZFS degraded.

本来在做数据集备份的，然后发现有一块盘提示 DEGRADED。心头一紧，完了。

1
root in 🌐 homelab in ~
2
❯ zpool status -x
3
  pool: bulk
4
 state: DEGRADED
5
status: One or more devices could not be used because the label is missing or
6
        invalid.  Sufficient replicas exist for the pool to continue
7
        functioning in a degraded state.
8
action: Replace the device using 'zpool replace'.
9
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
10
  scan: scrub repaired 0B in 06:05:15 with 0 errors on Sun Mar  8 06:29:16 2026
11
config:
12

13
        NAME                                   STATE     READ WRITE CKSUM
14
        bulk                                   DEGRADED     0     0     0
15
          raidz1-0                             DEGRADED     0     0     0
16
            10331498304071247840               FAULTED      0     0     0  was /dev/sda1
17
            ata-ST12000NM005G-2MT133_ZL28N553  ONLINE       0     0     0
18
            ata-ST12000NM005G-2MT133_ZLW1X3BT  ONLINE       0     0     0
19
        cache
20
          c1256568-01                          ONLINE       0     0     0
21

22
errors: No known data errors

检查具体盘块

zpool 命令给了一串意义不明的数字，暂时不知道为什么是这样分配的。

但是与此同时也给了具体是 /dev/sda1 损坏，因此查看一下具体是哪儿块盘就好：

1
root in 🌐 homelab in ~
2
❯ lsblk -o NAME,SIZE,MODEL,SERIAL
3
NAME          SIZE MODEL                         SERIAL
4
sda          10.9T ST12000NM005G-2MT133          ZLW1X3BT
5
├─sda1       10.9T
6
└─sda9          8M
7
sdb          10.9T ST12000NM005G-2MT133          ZL28N553
8
├─sdb1       10.9T
9
└─sdb9          8M
10
sdc          10.9T ST12000NM005G-2MT133          ZL28ED42
11
├─sdc1       10.9T
12
└─sdc9          8M
13
sdd           1.7T INTEL SSDSC2KB019T8           PHYF102500VU1P9DGN
14
├─sdd1        1.7T
15
└─sdd9          8M
16
sde           1.7T INTEL SSDSC2KB019T8           PHYF102500NN1P9DGN
17
├─sde1        1.7T
18
└─sde9          8M
19
nvme0n1     119.2G AirDisk 128GB SSD             QG8656B006640P110N
20
├─nvme0n1p1   3.7G
21
├─nvme0n1p2  14.9G
22
└─nvme0n1p3 100.6G
23
nvme1n1     953.9G WD PC SN560 SDDPNQE-1T00-1002 233506402946
24
├─nvme1n1p1 715.3G
25
└─nvme1n1p2 238.6G

对应到 /dev/sda1, 那就是序列号为 ZLW1X3BT 的盘坏掉了

深入检查

但是现在依然不清楚为什么盘会突然失效导致 RAIDZ1 降级，具体要继续深入。

检查 zpool 给的网址，可以得到如下信息：

Message ID: ZFS-8000-4J

Corrupted device label in a replicated configuration

Type: Error
Severity: Major
Description: A device could not be opened due to a missing or invalid device label.
Automated Response: A hot spare will be activated if available.
Impact: The pool is no longer providing the configured level of replication.

Type:	Error
Severity:	Major
Description:	A device could not be opened due to a missing or invalid device label.
Automated Response:	A hot spare will be activated if available.
Impact:	The pool is no longer providing the configured level of replication.

意思是 ZFS 的标签损坏，不一定是盘本身有问题。请出 smart 工具看看原因

S.M.A.R.T.

1
root in 🌐 homelab in ~ took 8s
2
❯ smartctl -a /dev/disk/by-id/ata-ST12000NM005G-2MT133_ZL28ED42
3
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.19.6+deb14-amd64] (local build)
4
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
5

6
=== START OF INFORMATION SECTION ===
7
Device Model:     ST12000NM005G-2MT133
8
Serial Number:    ZL28ED42
9
LU WWN Device Id: 5 000c50 0c707bbf3
10
Add. Product Id:  DELL(tm)
11
Firmware Version: EAL6
12
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
13
Sector Sizes:     512 bytes logical, 4096 bytes physical
14
Rotation Rate:    7200 rpm
15
Form Factor:      3.5 inches
16
Device is:        Not in smartctl database 7.5/5706
17
ATA Version is:   ACS-4 (minor revision not indicated)
18
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
19
Local Time is:    Thu Mar 26 21:20:08 2026 CST
20
SMART support is: Available - device has SMART capability.
21
SMART support is: Enabled
22

23
=== START OF READ SMART DATA SECTION ===
24
SMART overall-health self-assessment test result: PASSED
25

26
General SMART Values:
27
Offline data collection status:  (0x00) Offline data collection activity
28
                                        was never started.
29
                                        Auto Offline Data Collection: Disabled.
30
Self-test execution status:      (   0) The previous self-test routine completed
31
                                        without error or no self-test has ever
32
                                        been run.
33
Total time to complete Offline
34
data collection:                (   90) seconds.
35
Offline data collection
36
capabilities:                    (0x71) SMART execute Offline immediate.
37
                                        No Auto Offline data collection support.
38
                                        Suspend Offline collection upon new
39
                                        command.
40
                                        No Offline surface scan supported.
41
                                        Self-test supported.
42
                                        Conveyance Self-test supported.
43
                                        Selective Self-test supported.
44
SMART capabilities:            (0x0003) Saves SMART data before entering
45
                                        power-saving mode.
46
                                        Supports SMART auto save timer.
47
Error logging capability:        (0x01) Error logging supported.
48
                                        General Purpose Logging supported.
49
Short self-test routine
50
recommended polling time:        (   1) minutes.
51
Extended self-test routine
52
recommended polling time:        (1089) minutes.
53
Conveyance self-test routine
54
recommended polling time:        (   2) minutes.
55
SCT capabilities:              (0x70bd) SCT Status supported.
56
                                        SCT Error Recovery Control supported.
57
                                        SCT Feature Control supported.
58
                                        SCT Data Table supported.
59

60
SMART Attributes Data Structure revision number: 10
61
Vendor Specific SMART Attributes with Thresholds:
62
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
63
  1 Raw_Read_Error_Rate     0x010f   100   064   044    Pre-fail  Always       -       824864
64
  3 Spin_Up_Time            0x0103   091   087   000    Pre-fail  Always       -       0
65
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       57
66
  5 Reallocated_Sector_Ct   0x0133   100   100   010    Pre-fail  Always       -       0
67
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       102215035
68
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34919
69
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
70
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       53
71
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
72
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
73
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       8590065668
74
190 Airflow_Temperature_Cel 0x0022   071   050   040    Old_age   Always       -       29 (Min/Max 23/40)
75
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       42
76
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       17526
77
194 Temperature_Celsius     0x0022   029   050   000    Old_age   Always       -       29 (0 15 0 0 0)
78
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
79
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
80
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
81
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
82
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       24519 (206 253 0)
83
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       235469632312
84
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       4267211186416
85

86
SMART Error Log Version: 1
87
No Errors Logged
88

89
SMART Self-test log structure revision number 1
90
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
91
# 1  Extended offline    Completed without error       00%     31056         -
92
# 2  Vendor (0xdf)       Completed without error       00%         3         -
93
# 3  Short offline       Completed without error       00%         1         -
94

95
SMART Selective self-test log data structure revision number 1
96
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
97
    1        0        0  Not_testing
98
    2        0        0  Not_testing
99
    3        0        0  Not_testing
100
    4        0        0  Not_testing
101
    5        0        0  Not_testing
102
Selective self-test flags (0x0):
103
  After scanning selected spans, do NOT read-scan remainder of disk.
104
If Selective self-test is pending on power-up, resume after 0 minute delay.
105

106
The above only provides legacy SMART information - try 'smartctl -x' for more

根据 smart 信息来看，这块盘还算是健康。GPT 解析认为这块盘可能有经历过：

盘短暂失联
HBA / SATA 背板 / 电源不稳
系统重启、热插拔、总线重置
设备忙到超时，而不是盘面坏

但依然不影响总体健康情况。

重审视

现在的情况明了，结论比较偏向 ZFS 没有成功重关联盘的信息，使用 zdb 检查一下这块盘的 label：

1
root in 🌐 homelab in ~
2
❯ zdb -l /dev/disk/by-id/ata-ST12000NM005G-2MT133_ZL28ED42-part1
3
------------------------------------
4
LABEL 0
5
------------------------------------
6
    version: 5000
7
    name: 'bulk'
8
    state: 0
9
    txg: 2425395
10
    pool_guid: 1560922927800853045
11
    errata: 0
12
    min_alloc: 4096
13
    max_alloc: 4096
14
    hostid: 367074959
15
    hostname: 'homelab'
16
    top_guid: 16729480470073721455
17
    guid: 10331498304071247840
18
    hole_array[0]: 1
19
    vdev_children: 2
20
    vdev_tree:
21
        type: 'raidz'
22
        id: 0
23
        guid: 16729480470073721455
24
        nparity: 1
25
        metaslab_array: 512
26
        metaslab_shift: 34
27
        ashift: 12
28
        asize: 36000370262016
29
        min_alloc: 4096
30
        is_log: 0
31
        create_txg: 4
32
        children[0]:
33
            type: 'disk'
34
            id: 0
35
            guid: 10331498304071247840
36
            path: '/dev/sda1'
37
            devid: 'ata-ST12000NM005G-2MT133_ZL28ED42-part1'
38
            phys_path: 'pci-0000:c1:00.0-ata-1.0'
39
            whole_disk: 1
40
            DTL: 17352
41
            create_txg: 4
42
        children[1]:
43
            type: 'disk'
44
            id: 1
45
            guid: 16221263122293960021
46
            path: '/dev/disk/by-id/ata-ST12000NM005G-2MT133_ZL28N553-part1'
47
            devid: 'ata-ST12000NM005G-2MT133_ZL28N553-part1'
48
            phys_path: 'pci-0000:c1:00.0-ata-2.0'
49
            whole_disk: 1
50
            DTL: 17351
51
            create_txg: 4
52
        children[2]:
53
            type: 'disk'
54
            id: 2
55
            guid: 17765960766568586256
56
            path: '/dev/disk/by-id/ata-ST12000NM005G-2MT133_ZLW1X3BT-part1'
57
            devid: 'ata-ST12000NM005G-2MT133_ZLW1X3BT-part1'
58
            phys_path: 'pci-0000:c1:00.0-ata-3.0'
59
            whole_disk: 1
60
            DTL: 17350
61
            create_txg: 4
62
    features_for_read:
63
        com.delphix:hole_birth
64
        com.delphix:embedded_data
65
        com.klarasystems:vdev_zaps_v2
66
    labels = 0 1 2 3

看起来可以读取到 label 里的信息。根据手册中的指导，在这种情况下可以尝试重新上线一下设备：

1
root in 🌐 homelab in ~
2
❯ zpool online bulk /dev/disk/by-id/ata-ST12000NM005G-2MT133_ZL28ED42-part1
3
couldn't find device "/dev/disk/by-id/ata-ST12000NM005G-2MT133_ZL28ED42-part1" in pool "bulk"

看起来不行，那重启一下试试，毕竟盘是正常的

NOTE

不要学我，我这个机子可以随便重启。正确的操作是使用 replace 开始重建

终局

好吧然后他就开始自己重建了…

1
root in 🌐 homelab in ~
2
❯ zpool status
3
  pool: bulk
4
 state: ONLINE
5
status: One or more devices is currently being resilvered.  The pool will
6
        continue to function, possibly in a degraded state.
7
action: Wait for the resilver to complete.
8
  scan: resilver in progress since Thu Mar 26 22:04:50 2026
9
        407G / 9.68T scanned, 108G / 9.68T issued at 221M/s
10
        40.8G resilvered, 1.09% done, 12:36:19 to go
11
config:
12

13
        NAME                                   STATE     READ WRITE CKSUM
14
        bulk                                   ONLINE       0     0     0
15
          raidz1-0                             ONLINE       0     0     0
16
            sda                                ONLINE       0     0     3  (resilvering)
17
            ata-ST12000NM005G-2MT133_ZL28N553  ONLINE       0     0     0
18
            ata-ST12000NM005G-2MT133_ZLW1X3BT  ONLINE       0     0     0
19
        cache
20
          c1256568-01                          ONLINE       0     0     0
21

22
errors: No known data errors
23

24
  pool: striping
25
 state: ONLINE
26
  scan: scrub repaired 0B in 00:32:14 with 0 errors on Sun Mar  8 00:56:16 2026
27
config:
28

29
        NAME                                          STATE     READ WRITE CKSUM
30
        striping                                      ONLINE       0     0     0
31
          ata-INTEL_SSDSC2KB019T8_PHYF102500VU1P9DGN  ONLINE       0     0     0
32
          ata-INTEL_SSDSC2KB019T8_PHYF102500NN1P9DGN  ONLINE       0     0     0
33
        cache
34
          c1256568-02                                 ONLINE       0     0     0
35

36
errors: No known data errors

算是很不好的解决了这个问题吧。

NOTE

zfs 的重建居然叫作 resilvering，看起来很艺术。

问了一下 GPT，给了我如下回答：这个术语来源于“修复镜子”的隐喻，因为镜像池（Mirroring）就像是在修复镜面的反射层。

最后的最后

我把以上的内容发送给了 GPT，GPT 认为可能是硬盘时序问题，因此我打算继续追查一下。

在使用如下命令后没有找到有用的记录，大概是已经找不到了。

1
journalctl -k | grep -Ei 'sda|sdc|ZL28ED42|ata|ahci|scsi|reset|I/O error|offline'

还好这次坏掉的 zpool 是 RAIDZ1，而且在例行检查的时候发现了问题，不然数据要丢哭了。