We have setup a 6 Node Cluster using Iwarp configuration, QL41262 25GB Network Adapters. I have enabled RDMA everywhere I can see, i did this via the Dell setup guide.
Running the Test-clusterhealth command I get the below results.
I'm really struggling with the RDMA failures and now I'm getting failures on SMB. Saying disconnects, I'm having multiple reports of lagging and performance issues in guest VMs
PS C:\Scripts> .\Test-Clusterhealth.ps1Detected RDMA adapters: will require RDMA
******************** Basic Health Checks (3.6s)
All cluster nodes Up
Cluster node uptime:
PSComputerName Uptime
-------------- ------
S2D-NODE01 30d:00h:48m.50s
S2D-NODE02 0d:19h:47m.54s
S2D-NODE03 40d:16h:21m.24s
S2D-NODE04 0d:22h:14m.54s
S2D-NODE05 0d:02h:53m.07s
S2D-NODE06 2d:15h:45m.03s
Clustered storage subsystem Healthy
All pools Healthy
******************** Clusport Device Symmetry Check (2.1s)
********** Total
Pass with 72 per node
********** Disk Type
Pass with 60 per node
********** Solid/Non-Rotational Media
Pass with 12 per node
********** Enclosure Type
Pass with 6 per node
********** Virtual
Pass with none on any node
******************** Enclosure View Symmetry Check (4.1s)
********** Total
Pass with 6 per node
******************** Operational Issues and Storage Jobs (116.2s)
No storage rebuild or regeneration jobs are active
******************** Physical Disk Health (2.2s)
All physical disks are in normal auto-select or journal state
******************** Physical Disk View Symmetry Check (4.1s)
********** Total
Pass with 60 per node
******************** RDMA Adapter IP Check (8.9s)
*************** RDMA Adapter IP Check
********** Total
Pass with none on any node
*************** RDMA Adapter (Virtual) IP Check
********** Total
Fail
Count Name
----- ----
6 S2D-NODE01
4 S2D-NODE02
4 S2D-NODE03
6 S2D-NODE04
6 S2D-NODE05
7 S2D-NODE06
*************** RDMA Adapter (Physical) IP Check
********** Total
Pass with none on any node
******************** RDMA Adapters Symmetry Check (3.4s)
********** Total
Fail
Count Name
----- ----
5 S2D-NODE01
4 S2D-NODE02
4 S2D-NODE03
5 S2D-NODE04
5 S2D-NODE05
5 S2D-NODE06
********** Operational
Fail
Count Name
----- ----
5 S2D-NODE01
4 S2D-NODE02
4 S2D-NODE03
5 S2D-NODE04
5 S2D-NODE05
5 S2D-NODE06
********** Up
Fail
Count Name
----- ----
5 S2D-NODE01
4 S2D-NODE02
4 S2D-NODE03
5 S2D-NODE04
5 S2D-NODE05
5 S2D-NODE06
******************** SMB Connectivity Error Check - Connect Failures (2.4s)
PSComputerName RDMA Last5Min RDMA LastDay RDMA LastHour TCP Last5Min TCP LastDay TCP LastHour
-------------- ------------- ------------ ------------- ------------ ----------- ------------
S2D-NODE01 0 0 0 0 20 0
S2D-NODE02 0 0 0 0 10 0
S2D-NODE03 0 0 0 0 13 0
S2D-NODE04 0 0 0 0 12 0
S2D-NODE05 0 0 0 0 10 0
S2D-NODE06 0 0 0 0 14 0
******************** SMB Connectivity Error Check - Disconnect Failures (2.5s)
WARNING: the SMB Client is receiving RDMA disconnects. This is an error whose root"
cause may be PFC/CoS misconfiguration (RoCE) on hosts or switches, physical"
issues (ex: bad cable), switch or NIC firmware issues, and will lead to severely"
degraded performance. Additional triage is included in other tests."
PSComputerName RDMA Last5Min RDMA LastDay RDMA LastHour TCP Last5Min TCP LastDay TCP LastHour
-------------- ------------- ------------ ------------- ------------ ----------- ------------
S2D-NODE01 0 16 0 0 3 0
S2D-NODE02 0 11 0 0 11 0
S2D-NODE03 0 17 0 0 3 0
S2D-NODE04 0 8 0 0 1 0
S2D-NODE05 0 12 0 0 12 0
S2D-NODE06 0 18 0 0 6 0
******************** SMB CSV Multichannel Symmetry Check (2.5s)
********** Total
Fail
Count Name
----- ----
16 S2D-NODE01
10 S2D-NODE02
18 S2D-NODE03
12 S2D-NODE04
8 S2D-NODE05
14 S2D-NODE06
********** RDMA Capable
Fail
Count Name
----- ----
16 S2D-NODE01
10 S2D-NODE02
18 S2D-NODE03
12 S2D-NODE04
8 S2D-NODE05
14 S2D-NODE06
********** Selected & Non-Failed
Fail
Count Name
----- ----
16 S2D-NODE01
10 S2D-NODE02
18 S2D-NODE03
12 S2D-NODE04
8 S2D-NODE05
14 S2D-NODE06
******************** SMB SBL Multichannel Symmetry Check (2.6s)
********** Total
Fail
Count Name
----- ----
10 S2D-NODE01
10 S2D-NODE02
16 S2D-NODE03
10 S2D-NODE04
10 S2D-NODE05
10 S2D-NODE06
********** RDMA Capable
Fail
Count Name
----- ----
10 S2D-NODE01
10 S2D-NODE02
16 S2D-NODE03
10 S2D-NODE04
10 S2D-NODE05
10 S2D-NODE06
********** Selected & Non-Failed
Fail
Count Name
----- ----
10 S2D-NODE01
10 S2D-NODE02
16 S2D-NODE03
10 S2D-NODE04
10 S2D-NODE05
10 S2D-NODE06
******************** Virtual Disk Health (2.1s)
All operational virtual disks Healthy
PS C:\Scripts>