前言
等了半个月从香港寄来的便宜双口万兆网卡,才20欧一块,送到后满心欢喜的换上,结果差点把家里的Proxmox集群搞坏....
插上之后开机看似一切正常,切换到网络后显示也是万兆的,实际上ping其他节点延迟都在0.1ms以内感觉没啥问题。
root@PVE-BE-105:~# ethtool enp2s0f0 Settings for enp2s0f0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Auto-negotiation: off Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Supports Wake-on: g Wake-on: g Current message level: 0x00000000 (0) Link detected: yes
但是等个几分钟再看,发现PVE的节点变灰了,而且pvecluster服务开始不正常了。再过了一会儿整个集群5台机器都不能正常访问PVE的界面了。。。
一开始我以为是这个换了新网卡的节点在重启的时候出了什么问题,于是再次重启节点。节点一关,其他4台立刻恢复正常!!?
于是折腾了我两个小时查原因,最后发现是网络不停的在断开和重连:
然后我找啊找,一开始是觉得驱动问题,于是把be2net驱动重新装了一遍,还是一样的情况。后面到处翻论坛和博客查到,有说这张卡在debian下有丢包问题,于是我看了一下,果然:
root@PVE-BE-105:~# ethtool -S enp2s0f0 NIC statistics: ..... rx_address_filtered: 31657 .....
解决办法
ethtool刷入固件
首先看看目前的固件版本,版本是10.2.315.26,据说有问题的是10.3以前的版本:
root@PVE-BE-105:~# ethtool -i enp2s0f0 driver: be2net version: 6.8.12-1-pve firmware-version: 10.2.315.26 expansion-rom-version: bus-info: 0000:02:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: no supports-priv-flags: yes
然后我去博通的官网下载了oc14-11.2.1153.23.ufi固件:https://docs.broadcom.com/docs/12378837 真的找的我好辛苦啊QAQ
传到服务器上:scp oc14-11.2.1153.23.ufi [email protected]:/root/
然后胆战心惊的开始刷固件:
root@PVE-BE-105:~# ethtool -f enp2s0f0 oc14-11.2.1153.23.ufi 0 Flashing failed: Network is down
果然事情不会这么顺利.......查看dmesg
:
[14421.792999] be2net 0000:02:00.0: Firmware load not allowed (interface is down)
尝试重启后在不起用网卡、不插入模块的情况下再次刷固件,依旧是一样的情况
专用工具刷入固件
然后我又是一顿搜,在HP的固件支持页面找到了这个刷机工具OneConnect-Flash-12.0.1345.0-x64.iso
,据说是只要UEFI启动就可以刷了。
链接: https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_a36307b55e86403391d3526afd
虚拟光驱启动!然后就是漫长的等待,不知道为啥这么慢,卡在这里大概有十分钟:
进入刷机系统后交互很简单,监测到当前的网卡信息,问你是否要刷机,输入y
可以看到这里的固件版本和我之前ethtool的一致:10.2.315.26
然后就是更加漫长的等待,这里大概有至少20分钟:
最后完成会给出这样的提示,刷机成功!!!
验证效果
再次重启服务器,首先看一下目前的固件版本:
root@PVE-BE-105:~# ethtool -i enp2s0f0 driver: be2net version: 6.8.12-1-pve firmware-version: 12.0.1345.0 expansion-rom-version: bus-info: 0000:02:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: no supports-priv-flags: yes
然后再来看看状态:
root@PVE-BE-105:~# ethtool enp2s0f0 Settings for enp2s0f0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Auto-negotiation: off Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Supports Wake-on: g Wake-on: g Current message level: 0x00000000 (0) Link detected: yes
切换集群的网络到万兆卡后,问题依旧,唉.....又是一天瞎折腾,白干加白干!