我们这台故障机器是12盘位的戴尔EMC,10盘组raid10+1盘热备,安装MegaCli64看下面这个链接:
Proxmox(Debian)安装MegaCli64管理硬件Raid阵列卡
强烈建议看看这个:MegaCli操作手册
安装完后首先查看阵列状态:
root@JS-2002:~/megacli/Linux# MegaCli64 -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 9.093 TB Sector Size : 512 Mirror Data : 9.093 TB State : Degraded Strip Size : 64 KB Number Of Drives per span:2 Span Depth : 5 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Default Power Savings Policy: Controller Defined Current Power Savings Policy: None Can spin up in 1 minute: Yes LD has drives that support T10 power conditions: Yes LD's IO profile supports MAX power savings with cached writes: No Bad Blocks Exist: No Is VD Cached: No Exit Code: 0x00
root@JS-2002:~/megacli/Linux# MegaCli64 -pdinfo -physdrv[:3] -a0 Enclosure Device ID: N/A Slot Number: 3 Drive's position: DiskGroup: 0, Span: 1, Arm: 1 Enclosure position: N/A Device Id: 3 WWN: 5000C500260EACC4 Sequence Number: 2 Media Error Count: 0 Other Error Count: 5 Predictive Failure Count: 3 Last Predictive Failure Event Seq Number: 30255 PD Type: SAS Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Online, Spun Up Device Firmware Level: 0008 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c500260eacc5 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST32000444SS 00089WM3PSCZ FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature :29C (84.20 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : Yes Exit Code: 0x00
然后设置这个磁盘下线,同时标记missing:
root@JS-2002:~/megacli/Linux# MegaCli64 -PDOffline -PhysDrv [:3] -a0 Adapter: 0: EnclId-N/A SlotId-3 state changed to OffLine. Exit Code: 0x00 root@JS-2002:~/megacli/Linux# MegaCli64 -pdmarkmissing -physdrv[:3] -aAll EnclId-N/A SlotId-3 is marked Missing. Exit Code: 0x00
标记这个硬盘准备移除:
root@JS-2002:~/megacli/Linux# MegaCli64 -pdprprmv -physdrv[:3] -a0 Prepare for removal Success Exit Code: 0x00
这时候再看阵列的状态, 是Degraded:
root@JS-2002:~/megacli/Linux# MegaCli64 -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 9.093 TB Sector Size : 512 Mirror Data : 9.093 TB State : Degraded Strip Size : 64 KB Number Of Drives per span:2 Span Depth : 5 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Default Power Savings Policy: Controller Defined Current Power Savings Policy: None Can spin up in 1 minute: Yes LD has drives that support T10 power conditions: Yes LD's IO profile supports MAX power savings with cached writes: No Bad Blocks Exist: No Is VD Cached: No Exit Code: 0x00
然后将"热备"盘顶上,之前没有添加热备,只是插上了而已,这里最重要的是确定Array和row的参数是啥,找了好久....
实际上Raid10是将多组raid1的磁盘组成raid0阵列,所以在我们这里10盘的Raid10实际分成了5组Raid0。也就是这里面Array后面的参数。而row就是这每个raid1小组里面的0或者1,这样以来就好理解了,只要磁盘的Span号即可:
Enclosure Device ID: N/A Slot Number: 3 Drive's position: DiskGroup: 0, Span: 1, Arm: 1 Enclosure position: N/A Device Id: 3
是Array1,row1,于是:
root@JS-2002:~/megacli/Linux# MegaCli64 -PdReplaceMissing -PhysDrv[:10] -Array1 -row1 -a0 Adapter: 0: Failed to replace Missing PD at Array 1, Row 1. FW error description: The specified device is in a state that doesn't support the requested command. Exit Code: 0x32
替换失败了,是因为这个盘作为一个普通non-raid盘存在,所以我们直接把这块盘拔掉,然后插到3号盘的位置,神奇的开始rebuild了:
Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Rebuild Device Firmware Level: HPD7
搞定!
https://paste.ubuntu.com/p/dVXG3qvnGF/