Cross subnet communication to Windows failover cluster

September 3, 2019, 1:10 pm

≫ Next: Unable to add the node in multi subnet cluster

≪ Previous: Adding node to cluster that is on different vlan

Hello Hopefully this is the right location to post this, Our team is setting up Windows failover cluster to be access from across subnets. initially this is in a test environment. the issue we are seeing is that the clients who are on subnet A cannot speak to the Cluster network object nor the IP for the cluster name, which is on Subnet B. The clients can communicate fine with each of the nodes directly just not the cluster name/IP. Also Clients that are on the Same Subnet B can speak to the Cluster Name/IP fine. just not cross subnet. The switch that is connecting both subnets is a cisco 2960G. windows firewall is disabled as well.

Below is a brief diagram of the setup

Client subnetA => Ciscos 2960 <=> Cluster subnetb

Since the clients can reach the host individually, and the clients that are on the same subnet can access the cluster resources fine. I am leaning towards this may require a feature/capability in the network to handle communicating to a mac/ip address handled by the cluster.

Can anyone here point me in the right direction, any assistance most appreciated.

↧

Unable to add the node in multi subnet cluster

September 4, 2019, 10:54 pm

≫ Next: Microsoft Network Load Balancing not working as expected

≪ Previous: Cross subnet communication to Windows failover cluster

This is 3 node production cluster ( multi subnet cluster). It was working fine. I evicted the DR node and tried to add it back, it throws an error.

I am having issues only in adding the DR node. However I can able to add the DR node to my non-prod or Dev existing cluster.

FailoverCluster EventViewer

Log Name: Microsoft-Windows-FailoverClustering/Operational
Source: Microsoft-Windows-FailoverClustering
Date: 5/09/2019 3:39:06 PM
Event ID: 1281
Task Category: Security Manager
Level: Information
Keywords:
User: SYSTEM
Computer: DRNODE.orionhealth.saas
Description:
Joiner tried to Create Security Context using Package='Kerberos/NTLM' with Context Requirement ='0' and Timeout ='40000' for the target = 'akl-shrd-pdb1'

Log Name: Microsoft-Windows-FailoverClustering/Operational
Source: Microsoft-Windows-FailoverClustering
Date: 5/09/2019 3:38:53 PM
Event ID: 1650
Task Category: Cluster Virtual Adapter
Level: Information
Keywords:
User: SYSTEM
Computer: DRNODE.orionhealth.saas
Description:
Microsoft Failover Cluster Virtual Adapter (NetFT) has missed more than 40 percent of consecutive heartbeats.

Local endpoint: 10.13.6.200:~3343~
Remote endpoint: 10.10.6.190:~3343~

Error in powershell:-

The clustered role was not successfully created. For more information view the report file below.
Report file location: C:\Windows\cluster\Reports\Add Node Wizard 76cd451a-538a-4fbe-9c52-2f9498396d17 on 2019.09.05 At 15.35.42.htm
Add-ClusterNode : An error occurred while performing the operation.
An error occurred while adding nodes to the cluster 'CLUST'.
An error occurred while adding node 'NODE3' to cluster 'CLUST'.
This operation returned because the timeout period expired

DR Cluster log:

00000918.00001528::2019/09/05-00:59:08.498 DBG [NETFTAPI] Signaled NetftLocalConnect event for fe80::14:a91:8f79:5d8f
00000918.00001528::2019/09/05-00:59:08.498 DBG [NETFTEVM] FTI NetFT event handler got event: Local endpoint fe80::14:a91:8f79:5d8f:~0~ connected
00000918.00001230::2019/09/05-00:59:08.498 DBG [NETFTEVM] FTI NetFT event dispatcher pushing event: Local endpoint fe80::14:a91:8f79:5d8f:~0~ connected
00000918.00001230::2019/09/05-00:59:08.498 DBG [FTI][Initiator] Got Netft event Local endpoint fe80::14:a91:8f79:5d8f:~0~ connected
00000918.00001528::2019/09/05-00:59:08.498 DBG [NETFTEVM] TM NetFT event handler got event: Local endpoint fe80::14:a91:8f79:5d8f:~0~ connected
00000918.0000096c::2019/09/05-00:59:08.498 DBG [NETFTEVM] TM NetFT event dispatcher pushing event: Local endpoint fe80::14:a91:8f79:5d8f:~0~ connected
00000918.0000096c::2019/09/05-00:59:08.498 INFO [IM] got event: Local endpoint fe80::14:a91:8f79:5d8f:~0~ connected
00000918.00001528::2019/09/05-00:59:08.498 DBG [WM] Filtering event NETFT_LOCAL_CONNECT? 1
00000918.0000155c::2019/09/05-00:59:08.503 INFO [NODE] Node 1: New join with n4: stage: 'Send Current Membership Status for Join Policy'
00000918.0000155c::2019/09/05-00:59:08.503 INFO [MM] Node 1: Adding a stream to existing node 4
00000918.0000155c::2019/09/05-00:59:08.503 INFO [NODE] Node 1: n4 node object adding stream
00000918.0000155c::2019/09/05-00:59:08.503 DBG [NODE] Node 1: n4 node object got a channel
00000918.0000155c::2019/09/05-00:59:08.503 DBG [NODE] Node 1: Using new stream to n4, setting epoch to 1
00000918.0000155c::2019/09/05-00:59:08.503 DBG [NODE] Node 1: Done closing stream to n4
00000918.0000155c::2019/09/05-00:59:08.503 DBG [NODE] Node 1: My Fault Tolerant Session Id is now 8d14d294-1635-416f-9e7c-44450c2a9cce
00000918.0000155c::2019/09/05-00:59:08.503 INFO [NODE] Node 1: No reconnect in progress to n4, updating send queue based on new stream.
00000918.0000155c::2019/09/05-00:59:08.503 DBG [NODE] Node 1: Treating stream with n4 as new connection because epoch (1) is <= 1.
00000918.0000155c::2019/09/05-00:59:08.503 INFO [MQ-Node1] Clearing 0 unsent and 0 unacknowledged messages.
00000918.0000155c::2019/09/05-00:59:08.503 INFO [NODE] Node 1: Highest version with n4 = Major 9 Minor 1 Upgrade 8 ClusterVersion 0x00090008, lowest = Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00080003
00000918.0000155c::2019/09/05-00:59:08.503 INFO [NODE] Node 1: Done processing new stream to n4.
00000918.0000155c::2019/09/05-00:59:08.503 DBG [CHANNEL 10.10.6.190:~3343~] Close().
00000918.000012f0::2019/09/05-00:59:08.503 INFO [RGP] node 1: Node Connected 4 00000000000000000000000000000000000000000000000000000000000010010
00000918.000012f0::2019/09/05-00:59:08.503 INFO [RGP] sending to node(4) 1: 001(1) => 001(1) +() -() [()] , ()
00000918.0000155c::2019/09/05-00:59:08.503 INFO [PULLER NODE1] Just about to start reading from <refcounted count='3' typeid='.?AVSimpleSecureStream@mscs_security@@'/>
00000918.0000155c::2019/09/05-00:59:08.503 INFO [RGP] node 1: received new information from 4 starting the timer
00000918.00001528::2019/09/05-00:59:08.798 INFO [RGP] node 1: Tick
00000918.00001528::2019/09/05-00:59:08.798 INFO [RGP] node 1: selected partition 10903(3 4) as node 4 has quorum
00000918.00001528::2019/09/05-00:59:08.798 INFO [RGP] node 1: selected partition 10903(3 4) to join [using info from 4]
00000918.00001528::2019/09/05-00:59:08.798 INFO [RGP] node 1: cannot join yet. no connection to (3)
00000918.00001528::2019/09/05-00:59:08.798 INFO [RGP] sending to all nodes 1: 001(1) => 001(1) +() -() [()] , ()
00000918.00001528::2019/09/05-00:59:08.798 DBG [NODE] Node 1: eating message sent to the dead node 3
00000918.00000dbc::2019/09/05-00:59:08.798 INFO [RGP] node 1: received new information from 1 starting the timer
00000918.00001528::2019/09/05-00:59:09.111 INFO [RGP] node 1: Tick
00000918.00001528::2019/09/05-00:59:09.111 INFO [RGP] node 1: selected partition 10903(3 4) as node 4 has quorum
00000918.00001528::2019/09/05-00:59:09.111 INFO [RGP] node 1: selected partition 10903(3 4) to join [using info from 4]
00000918.00001528::2019/09/05-00:59:09.111 INFO [RGP] node 1: cannot join yet. no connection to (3)
00000918.00001528::2019/09/05-00:59:09.111 INFO [RGP] sending to all nodes 1: 001(1) => 001(1) +() -() [()] , ()
00000918.00001528::2019/09/05-00:59:09.111 DBG [NODE] Node 1: eating message sent to the dead node 3
00000918.00001524::2019/09/05-00:59:10.507 DBG [NETFTAPI] received NsiParameterNotification for 169.254.93.143 (IpDadStateInvalid)
00000918.000015a4::2019/09/05-00:59:10.507 DBG [NETFTAPI] received NsiDeleteInstance for 169.254.93.143
00000918.000015a4::2019/09/05-00:59:10.507 WARN [NETFTAPI] Failed to query parameters for 169.254.93.143 (status 0x80070490)
00000918.000015a4::2019/09/05-00:59:10.507 DBG [NETFTAPI] Signaled NetftLocalAdd event for 169.254.93.143
00000918.000015a4::2019/09/05-00:59:10.507 DBG [NETFTEVM] FTI NetFT event handler ignoring PnP add event for IPv4 LinkLocal address 169.254.93.143:~0~
00000918.000015a4::2019/09/05-00:59:10.507 DBG [NETFTEVM] TM NetFT event handler ignoring PnP add event for IPv4 LinkLocal address 169.254.93.143:~0~
00000918.000015a4::2019/09/05-00:59:10.507 DBG [WM] Filtering event NETFT_LOCAL_ADD? 1
00000918.000015a4::2019/09/05-00:59:10.509 WARN [NETFTAPI] Failed to query parameters for 169.254.93.143 (status 0x80070490)
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTAPI] Signaled NetftLocalRemove event for 169.254.93.143
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTEVM] FTI NetFT event handler ignoring PnP remove event for IPv4 LinkLocal address 169.254.93.143:~0~
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTEVM] TM NetFT event handler ignoring PnP remove event for IPv4 LinkLocal address 169.254.93.143:~0~
00000918.000015a4::2019/09/05-00:59:10.509 DBG [WM] Filtering event NETFT_LOCAL_REMOVE? 1
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTAPI] received NsiParameterNotification for 169.254.1.68 (IpDadStatePreferred)
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTAPI] Signaled NetftLocalAdd event for 169.254.1.68
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTEVM] FTI NetFT event handler ignoring PnP add event for IPv4 LinkLocal address 169.254.1.68:~0~
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTEVM] TM NetFT event handler ignoring PnP add event for IPv4 LinkLocal address 169.254.1.68:~0~
00000918.000015a4::2019/09/05-00:59:10.509 DBG [WM] Filtering event NETFT_LOCAL_ADD? 1
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTAPI] Signaled NetftLocalConnect event for 169.254.1.68
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTEVM] FTI NetFT event handler got event: Local endpoint 169.254.1.68:~0~ connected
00000918.00001230::2019/09/05-00:59:10.509 DBG [NETFTEVM] FTI NetFT event dispatcher pushing event: Local endpoint 169.254.1.68:~0~ connected
00000918.00001230::2019/09/05-00:59:10.509 DBG [FTI][Initiator] Got Netft event Local endpoint 169.254.1.68:~0~ connected
00000918.000015a4::2019/09/05-00:59:10.509 DBG [NETFTEVM] TM NetFT event handler got event: Local endpoint 169.254.1.68:~0~ connected
00000918.0000096c::2019/09/05-00:59:10.509 DBG [NETFTEVM] TM NetFT event dispatcher pushing event: Local endpoint 169.254.1.68:~0~ connected
00000918.0000096c::2019/09/05-00:59:10.509 INFO [IM] got event: Local endpoint 169.254.1.68:~0~ connected
00000918.000015a4::2019/09/05-00:59:10.509 DBG [WM] Filtering event NETFT_LOCAL_CONNECT? 1
00000918.000015a4::2019/09/05-00:59:10.510 DBG [NETFTAPI] received NsiAddInstance for fe80::5efe:169.254.1.68
00000918.000015a4::2019/09/05-00:59:10.510 DBG [NETFTAPI] received NsiParameterNotification for fe80::5efe:169.254.1.68 (IpDadStateDeprecated)
00000918.0000152c::2019/09/05-00:59:17.174 DBG [CORE] WriteVersionFunctor: beginning write attempts
00000918.00001530::2019/09/05-00:59:37.222 DBG [NETFT] FTI NetFT event handler deregistration successful.
00000918.00001530::2019/09/05-00:59:37.222 INFO [NODE] Node 1: New join with n3: stage: 'Wait for Heartbeats on Initial NetFT Route' status (1460) reason: '[FTI][Initiator] Aborting connection because NetFT route to node NODE2 on virtual IP fe80::35c4:f902:cbd4:33ef:~3343~ has failed to come up.'
00000918.00001530::2019/09/05-00:59:37.276 INFO [CORE] Node 1: Clearing cookie e7920b13-f4cf-46bb-84ba-79562d7745d8
00000918.00001530::2019/09/05-00:59:37.276 INFO [CORE] Node 1: Cookie Cache 465e1aa8-175f-4879-a473-0ad991998962 [NODE1]
00000918.00001530::2019/09/05-00:59:37.276 DBG [CHANNEL 10.10.6.191:~3343~] Close().
00000918.00001530::2019/09/05-00:59:37.329 WARN cxl::ConnectWorker::operator (): (1460)' because of '[FTI][Initiator] Aborting connection because NetFT route to node NODE2 on virtual IP fe80::35c4:f902:cbd4:33ef:~3343~ has failed to come up.'

00000918.00001530::2019/09/05-01:00:07.531 DBG [JPM] Node 1: contacts size for node NODE2 is 1, current index 0
00000918.00001530::2019/09/05-01:00:07.531 DBG [JPM] Node 1: Trying to connect to node NODE2 (IP: 10.10.6.191:~0~)
00000918.00001530::2019/09/05-01:00:07.531 DBG [HM] Trying to connect to NODE2 at 10.10.6.191:~3343~
00000918.00001524::2019/09/05-01:00:07.547 INFO [CONNECT] 10.10.6.191:~3343~: Established connection to remote endpoint 10.10.6.191:~3343~.
00000918.00001524::2019/09/05-01:00:07.547 INFO [SV] New real route: local (10.13.6.200:~49794~) to remote NODE2 (10.10.6.191:~3343~).
00000918.00001524::2019/09/05-01:00:07.547 INFO [SV] Got a new outgoing stream to NODE2 at 10.10.6.191:~3343~
00000918.00001524::2019/09/05-01:00:07.547 DBG [SM] Joiner: Initialized with SPN = NODE2, RequiredCtxAttrib = 0, HandShakeTimeout = 40000
00000918.0000154c::2019/09/05-01:00:07.547 DBG [SM] Handling auth handshake posted by thread id 5412
00000918.0000154c::2019/09/05-01:00:07.547 DBG [SM] Joiner: Versions: 1-10
00000918.0000154c::2019/09/05-01:00:07.547 DBG [SM] Joiner: ISC returned status = 590610 output Blob size 1723, service principal name HOST/NODE2, auth type MSG_AUTH_PACKAGE::KerberosAuth, attr: 83998
00000918.0000154c::2019/09/05-01:00:07.547 DBG [SM] Joiner: Sending SSPI blob of size 1723 to Sponsor
00000918.0000154c::2019/09/05-01:00:07.563 DBG [SM] Joiner: Switching to Schannel
00000918.00001524::2019/09/05-01:00:07.578 DBG [Schannel] Client: Chosen Cert's version = 2, serialNo = <vector len='16'>00000918.00001524::2019/09/05-01:00:07.735 INFO [SV] Authentication and authorization were successful
00000918.00001524::2019/09/05-01:00:07.735 INFO [VER] Got new TCP connection. Exchanging version data.
00000918.00001524::2019/09/05-01:00:07.735 DBG [VER] Calculated cluster versions: highest [Major 9 Minor 1 Upgrade 8 ClusterVersion 0x00090008], lowest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00080003] with exclude node list: (3)
00000918.00001524::2019/09/05-01:00:07.735 INFO [VER] Checking version compatibility for node NODE2 id 3 with following versions: highest [Major 9 Minor 1 Upgrade 8 ClusterVersion 0x00090008], lowest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00080003].
00000918.00001524::2019/09/05-01:00:07.735 INFO [VER] Version check passed: node and cluster highest supported versions match. Other node still supports lower level, so joining in downlevel mode.
00000918.00001524::2019/09/05-01:00:07.735 INFO mscs::VersionManagerAgent::IsCompatible: First run: setting CFL to 8.3 manually instead of looking for value in database
00000918.00001524::2019/09/05-01:00:07.735 DBG [CORE-Dbg] IsCompatible: setting operating version to 8.3 on first run
00000918.00001524::2019/09/05-01:00:07.750 INFO [SV] Negotiating message security level.
00000918.00001524::2019/09/05-01:00:07.750 INFO [SV] Already protecting connection with message security level 'Sign'.
00000918.00001524::2019/09/05-01:00:07.750 INFO [FTI] Got new raw TCP/IP connection.
00000918.00001524::2019/09/05-01:00:07.765 INFO [FTI][Initiator] This node (1) is initiator
00000918.00001524::2019/09/05-01:00:07.765 DBG [FTI][Initiator] Cookie for remote node is e7920b13-f4cf-46bb-84ba-79562d7745d8
00000918.00001524::2019/09/05-01:00:07.765 DBG [FTI] Stream already exists to node 3: false
00000918.00001524::2019/09/05-01:00:07.783 INFO [FTI][Initiator] Trying to select best endpoints among 169.254.1.68:~3343~, fe80::14:a91:8f79:5d8f:~3343~ (first pair) and 169.254.3.177:~3343~, fe80::35c4:f902:cbd4:33ef:~3343~ (second pair)
00000918.00001524::2019/09/05-01:00:07.785 INFO [HM] Marking route from realLocal 10.13.6.200:~49794~ -> realRemote 10.10.6.191:~3343~ as a cross-subnet route
00000918.00001524::2019/09/05-01:00:07.785 INFO [RouteDb] Route virtual fe80::14:a91:8f79:5d8f:~0~ to virtual fe80::35c4:f902:cbd4:33ef:~0~ added
00000918.00001524::2019/09/05-01:00:07.785 DBG [NETFT] Removing route <struct mscs::FaultTolerantRoute>
00000918.00001524::2019/09/05-01:00:07.785 DBG <realLocal>10.13.6.200:~3343~</realLocal>
00000918.00001524::2019/09/05-01:00:07.785 DBG <realRemote>10.10.6.191:~3343~</realRemote>
00000918.00001524::2019/09/05-01:00:07.785 DBG <virtualLocal>fe80::14:a91:8f79:5d8f:~0~</virtualLocal>
00000918.00001524::2019/09/05-01:00:07.785 DBG <virtualRemote>fe80::35c4:f902:cbd4:33ef:~0~</virtualRemote>
00000918.00001524::2019/09/05-01:00:07.785 DBG <Delay>1000</Delay>

Charles Peter

↧

Microsoft Network Load Balancing not working as expected

September 18, 2019, 2:45 am

≫ Next: Failover Cluster Manager bug on Server 2019 after .NET 4.8 installed - unable to type more than two characters in to the IP fields

≪ Previous: Unable to add the node in multi subnet cluster

I wish to have a failover cluster for an IIS site in my domain.
I have configured the cluster on port 80, however only once the network of that specific node is down will the cluster detect that node is down.
If I stop the site through IIS manager that node is still considered healthy.
What am I doing wrong? Is this what do the product supposed to do? If not what other product can help me?

↧

Failover Cluster Manager bug on Server 2019 after .NET 4.8 installed - unable to type more than two characters in to the IP fields

September 20, 2019, 8:46 am

≫ Next: s2d down after adding hard drives

≪ Previous: Microsoft Network Load Balancing not working as expected

We ran into a nasty bug on Windows Server 2019 and I can't find any KB articles on it. It's really easy to replicate.

1. Install Windows Server 2019 Standard with Desktop Experience from an ISO.

2. Install Failover Cluster Services.

3. Create new cluster, on the 4th screen, add the current server name. This is what it shows:

cluster services working correctly before .NET 4.8 is installed

4. Install .NET 4.8 from an offline installer. (KB4486153) and reboot.

5. After the reboot, go back to the same screen of the same Create Cluster Wizard and now it looks different:

cluster services broken afte.NET 4.8 is installed - unable to put in a 3-digit IP

Now we are unable to type in a 3 digit IP in any of the octet fields. It accepts a maximum of two characters.

Has anyone else encountered this? It should be really easy to reproduce.

↧

s2d down after adding hard drives

July 16, 2019, 8:51 am

≫ Next: How to expand vhdx disk VM on failover cluster

≪ Previous: Failover Cluster Manager bug on Server 2019 after .NET 4.8 installed - unable to type more than two characters in to the IP fields

s2d newbie here. This is a test environment, so nothing is probably supported hardware.

the setup

Running two optiplex 3050's with windows server 2016. They each have a spinning disk and an ssd via sata. the spinning disk is partitioned for the operating system and the rest i dumped into the s2d pool. With this hardware i setup a fail-over cluster w/ quarm coming from my domain controller, and the clusters' storage coming from the s2d pool. Everything was working well, but terrible slow.

the issue

To cure the slowness i decided to add a pcie, m.2 drive to each. After adding it to one machine the cluster and s2d drive came back w/ an error, i ran a repair inside of server manger. After that completed i added the same drive to the other machine and my s2d drive has been gone since. I've tried removing the last HD I added, rebooted each several times w/ no luck.

error's

when i look at the critical events for the cluster's disk in "fail-over cluster manager" there are a lot of repeating event ID. 5142, 1069, 1793

Any help would be greatly appreciated. I'd like to see if i can fix this in a test environment before i see it in production.

many thanks!

IT guy

↧

How to expand vhdx disk VM on failover cluster

September 24, 2019, 7:56 am

≫ Next: Setting up Storage for Clustering

≪ Previous: s2d down after adding hard drives

Hi,

I want to know the right way in order to expand a VHDX disk of a VM running on a failover cluster of 2 nodes. The nodes and the guest OS are running Windows Server 2012 R2.

I know that is posible to expand it online (with the VM running) but when I open the VM settings configuration page from Hyper-V manager, it says "some settings cannot bemodified because the virtual machine wasrunning".

Thanks in advance.

Cristian L Ruiz

↧

Setting up Storage for Clustering

September 12, 2019, 6:16 am

≫ Next: Cluster Aware Update (CAU) on Storage Spaces Direct (S2D) with Pre-staged Virtual Cluster Object (VCO)

≪ Previous: How to expand vhdx disk VM on failover cluster

We have 3 Dell servers we are trying to put into a cluster, and a 4th machine will be added later (VM). My issue now is I do not know how to set up the storage on the machines PRIOR to creating the Cluster so that the disks will be recognized as usable by the cluster. Currently when I use Cluster Manager and hit storage>right click disks>add disk, I get an error saying "No disks suitable for cluster disks were found. For diagnostic info about disks available to the cluster, use the Validate a Configuration Wizard to run Storage Tests."

So I checked the validation test and this is what I see (copy and pasted from validation test):

No disks were found on which to perform cluster validation tests. To correct this, review the following possible causes:

* The disks are already clustered and currently Online in the cluster. When testing a working cluster, ensure that the disks that you want to test are Offline in the cluster.

* The disks are unsuitable for clustering. Boot volumes, system volumes, disks used for paging or dump files, etc., are examples of disks unsuitable for clustering.

* Review the "List Disks" test. Ensure that the disks you want to test are unmasked, that is, your masking or zoning does not prevent access to the disks. If the disks seem to be unmasked or zoned correctly but could not be tested, try restarting the servers before running the validation tests again.

* The cluster does not use shared storage. A cluster must use a hardware solution based either on shared storage or on replication between nodes. If your solution is based on replication between nodes, you do not need to rerun Storage tests. Instead, work with the provider of your replication solution to ensure that replicated copies of the cluster configuration database can be maintained across the nodes.

* The disks are Online in the cluster and are in maintenance mode.

No disks were found on which to perform cluster validation tests.

↧

Cluster Aware Update (CAU) on Storage Spaces Direct (S2D) with Pre-staged Virtual Cluster Object (VCO)

September 11, 2019, 12:46 pm

≫ Next: Multiple SQL Cluster (2008, 2012, 2016) on a Single Windows Cluster

≪ Previous: Setting up Storage for Clustering

I have been running into a bug with CAU in RS1-14393 where it doesn’t accept the Pre-staged AD Object (fails both as Powershell parameter and GUI config), and instead tries to generate/submit a new randomized AD Object (example: CAU-81ea8e) to the domain controller to run the CAU from. Problem is, it doesn’t have permission to the AD domain controller (this is not my domain controller), and so it fails, but still tries to use the CAU object even though it was not correctly created in AD.

Here’s the part I’m stuck on:

https://docs.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating-requirements#additional-recommendations

“To configure CAU in self-updating mode, a virtual computer object (VCO) for the CAU clustered role must be created in Active Directory. CAU can create this object automatically at the time that the CAU clustered role is added, if the failover cluster has sufficient permissions. However, because of the security policies in certain organizations, it may be necessary to prestage the object in Active Directory. For a procedure to do this, see Steps for prestaging an account for a clustered role.”

The cluster object and cluster group both have Full Control permissions to the VCO, but the cluster still insists on trying to create a new randomized cluster object when I try to setup CAU.

I found the following technet article regarding CAU: https://social.technet.microsoft.com/Forums/windowsserver/en-US/a7a0d434-cd37-4592-a1f5-6d85ae4e1797/storage-spaces-direct-cluster-aware-updating-behaviour?forum=winserverfiles

This is the current procedure we use to run Windows Updates, which is all manual per-node: https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/maintain-servers. This procedure can take up to 2-3 weeks of manual work to patch the full 8 node cluster, waiting for CSV disk regeneration between each node reboot.

I’m still waiting for my IT organization to certify Server 2019 (RS5-17763) for production use, which is why I’m still using Server 2016 (RS1-14393) on all my S2D clusters that I am deploying, or I would upgrade to Server 2019 already.

If you have any additional data points you can share, or if you know of another forum who might use CAU and have some insight, I would be thankful for the assistance.

↧

Multiple SQL Cluster (2008, 2012, 2016) on a Single Windows Cluster

September 12, 2019, 2:19 am

≫ Next: Cluster Disk Driver Letter Fiasco

≪ Previous: Cluster Aware Update (CAU) on Storage Spaces Direct (S2D) with Pre-staged Virtual Cluster Object (VCO)

Hi Experts,

Can I have multiple SQL Cluster configured on a single environment (Windows Cluster?

For example - I have a 5 Node Windows 2012 R2 Windows Failover Cluster on which I want to configure 2 nodes to run as SQL 2008 Cluster, 4 nodes as SQL 2012 Cluster & 2 Nodes with 2016 Cluster.

The combination of nodes can be in any number (2 - SQL 2012, 4 - SQL 2008 or 5 - SQL 2016) but underlying I have only 5 nodes that will form the cluster. So, a single CNO (of 5 nodes) under which there will be multiple SQL Clusters (of 2, 3, 4 or 5 Nodes).

Is that possible? Above all, is that a recommended & supported scenario?

Thanks!

↧

Cluster Disk Driver Letter Fiasco

September 9, 2019, 10:01 am

≫ Next: Drain role Failed

≪ Previous: Multiple SQL Cluster (2008, 2012, 2016) on a Single Windows Cluster

I have a 4 node cluster and my Cluster Disk on node1 is assigned a drive letter D:

If I fail over my node1 to node2, the Drive Letter from Node1 gets assigned to Node2 and the Node2 Drive letter D: disappears

I am bit concerned, if i have something running on Node2 on Drive D: that will get lost?

I know in clustering u have to resreve the Drive Letters so that no other node uses those drive letter; but whats the solution to this in 2019; I dont want to use CSV

Thanks

↧

Drain role Failed

March 10, 2019, 4:01 am

≫ Next: VMs located on one of CSV volumes stopped migrating on one of cluster nodes

≪ Previous: Cluster Disk Driver Letter Fiasco

We have three Node N-1, N-2, N-3. I drain role from N-2 and there10 VM's Moved out of 14. 4 VM are not moving getting error . Tried to move manually but the error same. Please assists All the Node in are WIN-2012 R2

Error Message : "operation did not complete on resource virtual machine live migration"

↧

VMs located on one of CSV volumes stopped migrating on one of cluster nodes

September 24, 2019, 7:53 am

≫ Next: if resources fails, attempt restart on current node

≪ Previous: Drain role Failed

We have a 3 node cluster Windows 2016 with many VMs on 3 CSV volumes. At one moment (I'm not sure when) VMs located on first CSV volume stoped to migrate (live and quick) to fist node (only to first node). 1st volume is still visible from 1st node. Cluster validation didn't show any problem.
In event log Microsoft-Windows-Hyper-V-VMMS/Admin on 1st node:
EventID:16300
Cannot load a virtual machine configuration: The system cannot find file specified. (0x80070002) (Virtual machine ID ....)
EventID:21002
'VM name' Failed to create Planned Virtual Machine at migration destination:The system cannot find file specified. (0x80070002) (Virtual machine ID ....)

Any ideas how to fix this problem?

I would appreciate any help.

Thanks.

↧

if resources fails, attempt restart on current node

October 3, 2019, 5:38 am

≫ Next: WSSD vs. Azure Stack HCI certification

≪ Previous: VMs located on one of CSV volumes stopped migrating on one of cluster nodes

Period For Restarts

Maximum Restarts in the specified period

I am struggling to find anything that explains what this functionality means.

If I set the maximum restarts to 3, then does the cluster try to start the affected service 3 times before failing over? Do these 3 restarts happen immediately after each other, or is there some wait time built in?

How does the Period for restarts impact on the activities?

↧

WSSD vs. Azure Stack HCI certification

October 4, 2019, 11:28 am

≫ Next: CSV Autopause - Single client contification start.

≪ Previous: if resources fails, attempt restart on current node

A team member and I are having a debate. We want to know if it is "safe" to use the very recently released Lenovo SR635 or SR655 EPYC based servers, for building our own Win2019 Storage Spaces Direct cluster (all cluster components will be Windows certified).

The servers are listed in the Windows Server Catalog as Win2019 with Software-Defined Data Center (SDDC) Premium certification (SR635, SR655).

They are not listed in in the Azure Stack HCI Catalog.

He firmly believes that the systems needs to be in Azure Stack HCI catalog, in order to proceed

based on this PDF from Lenovo Certified Config for Microsoft S2D

I believe that we can use the servers

The S2D Hardware Requirements page, used to state of that only Software-Defined Data Center (SDDC) certification is required(this changed in August ;-[).
I look at the Lenovo doc as a list of configs that Lenovo will support (FYI, these servers were they released after the PDF was published)
The PDF is not a list of systems that can used for S2D, if we are the one supporting the cluster/solution.

So, which of us I "right"?

Regardless of "right", would you proceed anyways?

↧

CSV Autopause - Single client contification start.

September 3, 2019, 5:46 am

≫ Next: Firewall ports Failover Clustering in Server 2016

≪ Previous: WSSD vs. Azure Stack HCI certification

HI,

I've just got a warning from my cluster that one of my CSVs was stopped. But I just dont get what was going on.

From the Failoverclustering-CsvFs protocol I get this message:

"Volume {44179469-89e8-4971-b9ff-057c4579c647} is autopaused. Status 0xC00000C4. Source: Single client contification start."

What does that even mean? Single Client contification?

Best Regards

Daniel

↧

Firewall ports Failover Clustering in Server 2016

May 30, 2017, 8:58 am

≫ Next: SMB Signing breaks CSV access cross-node

≪ Previous: CSV Autopause - Single client contification start.

Hello - I'm configuring MS Failover Cluster across two datacenters with different IP Ranges using server 2016. What firewall ports are needed to setup two nodes cluster and witness file share ?

Thanks

↧

SMB Signing breaks CSV access cross-node

September 16, 2019, 7:29 pm

≫ Next: Cluster events not written to system event log

≪ Previous: Firewall ports Failover Clustering in Server 2016

Hey all, couldn't find an article that answers my problem, so starting my own :).
Hopefully I put in enough detail.

Server 2012 R2 Hyper-V Failover Cluster environment.
2 nodes. 1 SAN via SAS.
Disks added as CSV. Hyper-V config and vhds on CSVs.
Each node has 12 NICs.
NIC 1 - Mgmt - Gateway IP, DNS IP - 192.168.0.X/24
NIC 2 - Live Migration - IP only, no Gateway, no DNS - 10.20.30.X/24
NIC 3 to 10 - Windows Teamed Interface - LACP on Switch, added as Virtual Switch, External network, does not share mgmt
NIC 12 - DMZ - added as Virtual Switch, External network, does not share mgmt

Everything is fine. Cluster works, live migration works.

Recently we're going through a security exercise, operating Tenable.io, and remediating results found.
One of them is SMB Signing. I have been enabing the Group Policy "Microsoft network server: Digitally sign communications (always)" across various servers, testing along the way.

Until I apply this to my nodes. My CSVs don't appear to like it. After a few days, when trying to access a CSV in C:\ClusterStorage that is owned by another node, I can't see the Space used, and when trying to access it, I get "you have been denied permission to access this folder".
Removing "Microsoft network server: Digitally sign communications (always)" on both instantly restores this communication.

After googling around, I have been witnessing a few Event Log errors in SMBClient, Event 30803 and 31010, but I'm not yet sure if it's related. I am still trying to monitor it without the policy change. This is an example:

[Event ID 30803]

The network connection failed.

Error: {Device Timeout}
The specified I/O operation on %hs was not completed before the time-out period expired.

Server name: fe80::e0a9:e45:5b2b:f594%25
Server address: 10.20.30.2:445
Connection type: Wsk

Guidance:
This indicates a problem with the underlying network or transport, such as with TCP/IP, and not with SMB. A firewall that blocks port 445 or 5445 can also cause this issue.

[Event ID 31010]

The SMB client failed to connect to the share.

Error: {Access Denied}
A process has requested access to an object, but has not been granted those access rights.

Path: \fe80::e0a9:e45:5b2b:f594%25\454b7f2d-4e6c-4332-ae29-5e4befc5ce5b-135266304$

So what am I missing? Is it something to do with SMB Signing trying to verify an identity, and CSVs are using SMB across the Live Migration network, 10.20.30.2, but these errors are showing IPv6 address as a server name?

↧

Cluster events not written to system event log

May 5, 2011, 11:42 am

≫ Next: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

≪ Previous: SMB Signing breaks CSV access cross-node

Hi,

I have a Windows Server 2008 R2 cluster that is not writing cluster events to the system event log. When I trigger a failover event, the failover happens successfully but nothing is logged to the system event log.

Is there a way that I can fix this?

Thanks,

Howard

↧

Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

December 9, 2016, 1:27 am

≫ Next: SCOM2016 monitoring S2D General File Server

≪ Previous: Cluster events not written to system event log

We are trying to deploy 'guest cluster' scenario over HyperV with shared disks set over SOFS. By design .vhds format should fully support backup feature.

All machines (HyperV, guest, SOFS) are installed with Windows Server 2016 Datacenter. Two HyperV virtual machines are configured to use shared disk in .vhds format (located on SOFS cluster formed of two nodes). SOFS cluster has a share configured for applications and HyperV uses \\sofs_server\share_name\disk.vhds path to SOFS remote storage). Guest cluster is configured with 'File server' role and 'Failover clustering' feature to form a guest cluster. There are two disks configured on each of guest cluster nodes: 1 - private system disk in .vhdx format (OS) and 2 - shared .vhds disk on SOFS.

While trying to make a checkpoint for guest machine, I get following error:

Cannot take checkpoint for 'guest-cluster-node0' because one or more sharable VHDX are attached and this is not part of a checkpoint collection.

Production checkpoints are enabled for VM + 'Create standard checkpoint if it's not possible to create a production checkpoint' option is set. All integration services (including backup) are enabled for VM.

When I delete .vhds disk of shared drive from SCSI controller of VM, checkpoints are created normally (for private OS disk).

It is not clear what is 'checkpoint collection' and how to add shared .vhds disk to this collection. Please advise.

Thanks.

↧

SCOM2016 monitoring S2D General File Server

October 10, 2019, 8:42 am

≫ Next: Error validating cluster computer resource name (Server 2016 Datacenter Cluster)

≪ Previous: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

We are running a SOFS and a GFS on a S2D cluster. We have SCOM2016 running with Storage Spaces Direct Management Pack and SCOM sees the SOFS shares with a lot of great information. I am not seeing the GFS in SCOM. Any ideas?

↧