首页 > 解决方案 > 从服务器物理移除不相关的硬盘驱动器后,ZFS 报告运行状况下降

问题描述

我有一台在 NVMe SSD (ext4) 和 2 个 ZFS 池上运行 Debian 10 (Proxmox) 的家庭服务器。第一个池是 8x8TB Raid Z2 阵列vault,另一个是 RAID 0 2x1TB 阵列,称为workspace.

我最近有删除的愿望workspace。我停止了对文件系统的所有文件操作,卸载了文件系统,然后继续zfs destroy在池上运行。我从物理上卸下了其中一个驱动器,workspace然后重新启动了计算机。

当我重新启动时,我希望只看到vaultZFS 池并且它是健康的。但是,当我检查时,我发现它现在处于 DEGRADED 状态。

root@homenas:~# zpool status
  pool: vault
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub in progress since Tue Apr 21 19:03:01 2020
        2.35T scanned at 6.50G/s, 336G issued at 930M/s, 6.69T total
        0B repaired, 4.91% done, 0 days 01:59:31 to go
config:

        NAME                      STATE     READ WRITE CKSUM
        vault                     DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            sda                   ONLINE       0     0     0
            sdb                   ONLINE       0     0     0
            sdc                   ONLINE       0     0     0
            sdd                   ONLINE       0     0     0
            sde                   ONLINE       0     0     0
            sdf                   ONLINE       0     0     0
            15380939428578218220  FAULTED      0     0     0  was /dev/sdi1
            8563980037536059323   UNAVAIL      0     0     0  was /dev/sdj1

errors: No known data errors

我相信驱动器可能已被重新分配到不同的路径/dev/sdX。我不确定为什么一个出现故障而一个不可用。vaultZFS 池仍然处于活动状态,我已经在运行备份以将所有最近的数据汇集到另一个存储介质。

但是,是否有可能恢复我的vault游泳池并使其恢复健康?如果这是由于删除workspace池后某些驱动器切换造成的,那么恢复阵列的最佳选择是什么?

这是我从 fdisk 获得的信息:

root@homenas:~# fdisk -l
Disk /dev/nvme0n1: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDBRPG5000ANC-WRSN
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start       End   Sectors   Size Type
/dev/nvme0n1p1      34      2047      2014  1007K BIOS boot
/dev/nvme0n1p2    2048   1050623   1048576   512M EFI System
/dev/nvme0n1p3 1050624 976773134 975722511 465.3G Linux LVM


Disk /dev/sda: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sda1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sda9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdb: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdb1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdb9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdc1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdc9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdd: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdd1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdd9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sde: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sde1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sde9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdf: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdf1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdf9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdg: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC  WDBNCE0010P
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device          Start        End    Sectors   Size Type
/dev/sdg1        2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdg9  1953507328 1953523711      16384     8M Solaris reserved 1


Disk /dev/sdh: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdh1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdh9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdi: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdi1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdi9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-vm--100--disk--0: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0x97e9a0ea

Device                                 Boot Start      End  Sectors Size Id Type
/dev/mapper/pve-vm--100--disk--0-part1 *     2048 67106815 67104768  32G 83 Linux

感谢您提供的任何信息。

标签: linuxnaszfsraidsan

解决方案


我能够解决该问题并使 RAID 阵列恢复到健康状态,而不会丢失任何数据。我已经通过使用磁盘标识符而不是分配的磁盘路径解决了这个问题udev

如果其他人在向系统添加或删除不相关的磁盘后遇到 ZFS 池故障并确定他们也使用磁盘路径而不是最初的静态标识符创建了池,我将提供以下步骤。

  1. 备份您的数据;在继续之前,请确保您拥有池数据的完整副本。

  2. 停止对文件系统的所有写入(在这种情况下我使用的是 Docker):

docker stop $(docker ps -aq)
  1. 卸载文件系统
umount /vault
  1. 导出 ZFS 池
zpool export vault
  1. 这次使用磁盘标识符而不是磁盘路径导入 ZFS 池。
zpool import -d /dev/disk/by-id vault
  1. 检查 ZFS 池的状态。游泳池现在应该看起来很健康。
root@homenas:~# zpool status
  pool: vault
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 73.5M in 0 days 00:00:07 with 0 errors on Wed Apr 22 10:38:47 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        vault                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0    25
            wwn-0xabcdefghijklmnoq  ONLINE       0     0    16

errors: No known data errors

我们可以看到一些数据在重新导入时重新同步,但在几秒钟内很快就解决了。

  1. 运行clear以从 ZFS 重置标志。
zpool clear vault
  1. 运行scrub以确保 ZFS 池处于良好状态。
zpool scrub vault
  1. 检查status以确保游泳池是健康的。
root@homenas:~# zpool status
  pool: vault
 state: ONLINE
  scan: scrub repaired 0B in 0 days 01:59:47 with 0 errors on Wed Apr 22 12:46:58 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        vault                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca252de17d4  ONLINE       0     0     0
            wwn-0x5000c500c4e46bf9  ONLINE       0     0     0
            wwn-0x5000c500c4e65198  ONLINE       0     0     0
            wwn-0x5000c500c4e616a4  ONLINE       0     0     0
            wwn-0x5000c500c4ac129e  ONLINE       0     0     0
            wwn-0x5000c500c4e3f74a  ONLINE       0     0     0
            wwn-0x5000cca257eb9299  ONLINE       0     0     0
            wwn-0x5000c500c4e50efc  ONLINE       0     0     0

errors: No known data errors

推荐阅读