Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

December 9, 2016, 1:27 am

≫ Next: Clustering in Server 2008 R2

≪ Previous: Failover cluster (group) maximum failures limit

We are trying to deploy 'guest cluster' scenario over HyperV with shared disks set over SOFS. By design .vhds format should fully support backup feature.

All machines (HyperV, guest, SOFS) are installed with Windows Server 2016 Datacenter. Two HyperV virtual machines are configured to use shared disk in .vhds format (located on SOFS cluster formed of two nodes). SOFS cluster has a share configured for applications and HyperV uses \\sofs_server\share_name\disk.vhds path to SOFS remote storage). Guest cluster is configured with 'File server' role and 'Failover clustering' feature to form a guest cluster. There are two disks configured on each of guest cluster nodes: 1 - private system disk in .vhdx format (OS) and 2 - shared .vhds disk on SOFS.

While trying to make a checkpoint for guest machine, I get following error:

Cannot take checkpoint for 'guest-cluster-node0' because one or more sharable VHDX are attached and this is not part of a checkpoint collection.

Production checkpoints are enabled for VM + 'Create standard checkpoint if it's not possible to create a production checkpoint' option is set. All integration services (including backup) are enabled for VM.

When I delete .vhds disk of shared drive from SCSI controller of VM, checkpoints are created normally (for private OS disk).

It is not clear what is 'checkpoint collection' and how to add shared .vhds disk to this collection. Please advise.

Thanks.

↧

Clustering in Server 2008 R2

October 12, 2018, 5:49 am

≫ Next: Building a Two-Node Failover Cluster

≪ Previous: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

One of my File Servers in a 3 node cluster 2008R2 cluster, went off line .. when I brought it back on line it had reverted to data from 11 months ago ... How can this happen ? How can I get back 11 months of data ...

Can anyone please help ?

Chris.

↧

Building a Two-Node Failover Cluster

November 14, 2018, 11:21 pm

≫ Next: Windows AD user password reset details through powershell script!!!

≪ Previous: Clustering in Server 2008 R2

I have issue whe i try to create a Two-Node Failover Cluster :

i got this massges

Node SJEDITB41606.corp.sva.com successfully issued call to Persistent Reservation RESERVE for Test Disk 0 which is currently reserved by node SJEDITB41607.corp.sva.com. This call is expected to fail.

Test Disk 0 does not provide Persistent Reservations support for the mechanisms used by failover clusters. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.

↧

Windows AD user password reset details through powershell script!!!

November 1, 2018, 2:23 am

≫ Next: basic to CSV import -data loss

≪ Previous: Building a Two-Node Failover Cluster

Hello All,

How to get the Password change/reset logs from AD server through Powers-shell Command with on daily basis automatic

scripting.Currently i am using below command to get the last reset details.

get-aduser -filter * -properties passwordlastset, passwordneverexpires |ft Name, passwordlastset, Passwordneverexpires

dinesh kumar

↧

basic to CSV import -data loss

November 3, 2018, 9:06 am

≫ Next: Create a new Hyper-V Replica Broker on a 2 node cluster crashes the cluster resource manage and the rle fails to start

≪ Previous: Windows AD user password reset details through powershell script!!!

We are sort of in a transition. Trying to build a separate windows server 2016 infrastructure.

Our SAN vendor does not yet support CSV volumes. They expect to have update in a few weeks tat resolves this. I have the connection to San from cluster node as basic disks. I would like to get started on VM creation. If I would start building the VMs on the basic disks could I later add these disks into cluster shared volumes without losing data?

Thanks for your time and consideration.

↧

Create a new Hyper-V Replica Broker on a 2 node cluster crashes the cluster resource manage and the rle fails to start

November 9, 2018, 10:45 pm

≫ Next: Record Hyper-V guest parent

≪ Previous: basic to CSV import -data loss

Hello ,

We have a domainless 2 node Windows 2016 cluster . THis was setup for SQL Server availability groups and works fine for that.

I have HyperV installed on both nodes and want to replicate the VM guests from one server to the other. I read that I need to Add the Hyper-V Replica Broker to the cluster before configuring the VM guests for replication.

Everytime I create a new Role the RHS.exe (I think this is the cluster resource manager) crashes and the role fails to start.

The crashes also affects our SQL Server availability groups as well.

I have look at the Cluster logs but there doesn't seem to be an error reason.

Can anyone help (I don't even know what to post to help find out what the problem is)

Both servers are up to date with Windows updates. Its Windows 2016 . No doman.

Any and all helo would be great

Thanks

Greg

↧

Record Hyper-V guest parent

November 14, 2018, 4:18 am

≫ Next: SMBWitnessClient EventID 8 - Failed to register from Trusted Domain

≪ Previous: Create a new Hyper-V Replica Broker on a 2 node cluster crashes the cluster resource manage and the rle fails to start

In order to satisfy server licensing on our 4 node Windows Server 2012R2 cluster, I need to keep 90 days worth of logs that show which guest vm is hosted by which host.

Is there any way to accurately record this? I tried GetCluster-log but it doesn't seem to show vm affinity, only CSV affinity.

Thanks in advance,

Matt

↧

SMBWitnessClient EventID 8 - Failed to register from Trusted Domain

November 12, 2018, 1:20 am

≫ Next: creates a replication but this error occurs. Storage Replica - Windows Server 2019 Standard.

≪ Previous: Record Hyper-V guest parent

Hi there!

I am having errors every 30sec on machines that try to connect to SMB from a failover cluster from a trusted domain.

Event ID 8

Error details: Witness Client failed to register with Witness Server TestSRV02 for notification on NetName \\TestSrv with error (The parameter is incorrect.)

I know that to connect to the trusted domain I need to add the full FQDN but as the server requests the list of Witness Servers from the Failover Cluster, it seems that the list returns without FQDN so my server cannot connect without it.

MCSE: Server Infrastructure

↧

creates a replication but this error occurs. Storage Replica - Windows Server 2019 Standard.

November 16, 2018, 10:18 pm

≫ Next: windows 2019 s2d cluster failed to start event id 1809

≪ Previous: SMBWitnessClient EventID 8 - Failed to register from Trusted Domain

creates a replication but this error occurs.

New-SRPartnership : Unable to synchronize replication group rgteste2, detailed reason: Cannot update state for replication group rgteste2 in the Storage Replica driver.

At line:1 char:1
+ New-SRPartnership -SourceComputerName SR1 -SourceRGName rgteste1 -Sou ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (MSFT_WvrAdminTasks:root/Microsoft/...T_WvrAdminTasks) [New-SRPartnership], CimException
+ FullyQualifiedErrorId : Windows System Error 1395,New-SRPartnership

Can anyone tell me why this error?

Att. Gabriel Luiz

↧

windows 2019 s2d cluster failed to start event id 1809

October 4, 2018, 2:26 am

≫ Next: Failover

≪ Previous: creates a replication but this error occurs. Storage Replica - Windows Server 2019 Standard.

Hi I have lab with insider windows 2019 cluster which I inplace upgraded to rtm version of 2019 server and cluster is shutdown after while and event id 1809 is listed

This node has been joined to a cluster that has Storage Spaces Direct enabled, which is not validated on the current build. The node will be quarantined.

Microsoft recommends deploying SDDC on WSSD [https://www.microsoft.com/en-us/cloud-platform/software-defined-datacenter] certified hardware offerings for production environments. The WSSD offerings will be pre-validated on Windows Server 2019 in the coming months. In the meantime, we are making the SDDC bits available early to Windows Server 2019 Insiders to allow for testing and evaluation in preparation for WSSD certified hardware becoming available.

Customers interested in upgrading existing WSSD environments to Windows Server 2019 should contact Microsoft for recommendations on how to proceed. Please call Microsoft support [https://support.microsoft.com/en-us/help/4051701/global-customer-service-phone-numbers].

Its kind weird because my s2d cluster is running in VMs is there some registry switch to disable this stupid lock ???

↧

Failover

November 19, 2018, 1:13 pm

≫ Next: WSFC broken, please help diagnose

≪ Previous: windows 2019 s2d cluster failed to start event id 1809

I have never written powershell script. I need script to push the software to see second server if first fails. I have to be making this way to hard.

↧

WSFC broken, please help diagnose

May 17, 2018, 3:30 am

≫ Next: Windows AD user password reset details through powershell script!!!

≪ Previous: Failover

I have a 2016 WSFC with file server role. 2 Nodes in the cluster shared storage. We lost Power to Node2 which died, when bringing it back up it wont join the cluster (shows 'Down' in failover cluster manager). If I shut down the entire cluster completley and start it on Node2 first, Node2 runs the cluster fine but Node1 now wont join the cluster (shows 'Down')

As far as I can tell all connectivity seems fine, I've turned off windows firewall, the network between the two servers is working fine and no firewalls in between the two nodes. Other clusters are running on the same infrastructure.

The only hints in failover cluster manager are that the Network connection for Node2 shows as offline (the network is up and working has the allow traffic and management ticked, can ping, RDP etc.

When I shutdown then restart the entire cluster Node2 first, roles become reversed, Node1 now shows network as offline, information details or crytical events for network have no entries

Crytical Events for Node2 itself, when in down state show: Error 1653 Cluster node 'Node2' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls. - however im not convinvced this is actually the issue because of the below error messages:

The failover clustering log is as follows:

00000774.00001c4c::2018/05/15-16:48:50.659 INFO [Schannel] Server: Negotiation is done, protocol: 10, security level: Sign00000774.00001c4c::2018/05/15-16:48:50.663 DBG [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 16100000774.00001c4c::2018/05/15-16:48:50.712 DBG [Schannel] Server: ASC, sec: 90312, buf: 205900000774.00001c4c::2018/05/15-16:48:50.728 DBG [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 199200000774.00001c4c::2018/05/15-16:48:50.730 DBG [Schannel] Server: ASC, sec: 0, buf: 5100000774.00001c4c::2018/05/15-16:48:50.730 DBG [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Synchronize, buf: 000000774.00001c4c::2018/05/15-16:48:50.730 INFO [Schannel] Server: Security context exchanged for cluster00000774.00001c4c::2018/05/15-16:48:50.735 DBG [Schannel] Client: ISC, sec: 90312, buf: 17800000774.00001c4c::2018/05/15-16:48:50.736 DBG [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 6000000774.00001c4c::2018/05/15-16:48:50.736 DBG [Schannel] Client: ISC, sec: 90312, buf: 21000000774.00001c4c::2018/05/15-16:48:50.749 DBG [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 213300000774.00001c4c::2018/05/15-16:48:50.752 DBG [Schannel] Client: ISC, sec: 90364, buf: 5800000774.00001c4c::2018/05/15-16:48:50.753 DBG [Schannel] Client: ISC, sec: 90364, buf: 1400000774.00001c4c::2018/05/15-16:48:50.753 DBG [Schannel] Client: ISC, sec: 90312, buf: 6100000774.00001c4c::2018/05/15-16:48:50.754 DBG [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 7500000774.00001c4c::2018/05/15-16:48:50.754 DBG [Schannel] Client: ISC, sec: 0, buf: 000000774.00001c4c::2018/05/15-16:48:50.754 INFO [Schannel] Client: Security context exchanged for netft00000774.00001c4c::2018/05/15-16:48:50.756 WARN [ClRtl] Cannot open crypto container (error 2148073494). Giving up.00000774.00001c4c::2018/05/15-16:48:50.756 ERR mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)'00000774.00001c4c::2018/05/15-16:48:50.756 WARN mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'00000774.00001c4c::2018/05/15-16:48:50.756 DBG [CHANNEL 172.23.1.15:~56287~] Close().

specifically:

Server: Negotiation is done (aka they talked to eachother?)
[ClRtl] Cannot open crypto container (error 2148073494). Giving up. mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)' mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'

I cant find many if any articles dealing with these messages, the only ones I can find, say to make sure permissions are correct on %SystemRoot%\Users\All Users\Microsoft\Crypto\RSA\MachineKeys

I did have to change some of the permissions on these files but still couldnt join the cluster. Other than that im struggling to find any actual issues (SMB access from node1 to node2 appears to be fine, smb access from node2 to node1 appears to be fine, dns appears to be working fine, file share whitness seems to be fine)

Finally the cluster vlaidations report shows these two errors as the only errors with the cluster

Validate disk Arbitration: Failed to release SCSI reservation on Test Disk 0 from node Node2.domain: Element not found.

Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node1.domain to the share on node Node2.domain. The network path was not found.

Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node2.domain to the share on node Node1.domain. The network path was not found.

other errors from the event logs

ID5398 Cluster failed to start. The latest copy of cluster configuration data was not available within the set of nodes attempting to start the cluster. Changes to the cluster occurred while the set of nodes were not in membership and as a result were not able to receive configuration data updates. .Votes required to start cluster: 2 Votes available: 1Nodes with votes: Node1 Node2 Guidance:Attempt to start the cluster service on all nodes in the cluster so that nodes with the latest copy of the cluster configuration data can first form the cluster. The cluster will be able to start and the nodes will automatically obtain the updated cluster configuration data. If there are no nodes available with the latest copy of the cluster configuration data, run the 'Start-ClusterNode -FQ' Windows PowerShell cmdlet. Using the ForceQuorum (FQ) parameter will start the cluster service and mark this node's copy of the cluster configuration data to be authoritative. Forcing quorum on a node with an outdated copy of the cluster database may result in cluster configuration changes that occurred while the node was not participating in the cluster to be lost.

ID4350 Cluster API call failed with error code: 0x80070046. Cluster API function: ClusterResourceTypeOpenEnum Arguments: hCluster: 4a398760 lpszResourceTypeName: Distributed Transaction Coordinator lpcchNodeName: 2

Lastly I built another Server node3 to see if I could join it to the cluster but this fails:

* The server 'Node3.domain' could not be added to the cluster. An error occurred while adding node 'Node3.domain' to cluster 'CLUS1'. Keyset does not exist

ive done the steps here with no joy, http://chrishayward.co.uk/2015/07/02/windows-server-2012-r2-add-cluster-node-cluster-service-keyset-does-not-exist/

↧

Windows AD user password reset details through powershell script!!!

November 1, 2018, 2:23 am

≫ Next: Stretch Cluster / Storage Replica / Log volume VSS Snapshots due to built-in Cluster Config Backup ?

≪ Previous: WSFC broken, please help diagnose

Hello All,

How to get the Password change/reset logs from AD server through Powers-shell Command with on daily basis automatic

scripting.Currently i am using below command to get the last reset details.

get-aduser -filter * -properties passwordlastset, passwordneverexpires |ft Name, passwordlastset, Passwordneverexpires

dinesh kumar

↧

Stretch Cluster / Storage Replica / Log volume VSS Snapshots due to built-in Cluster Config Backup ?

November 2, 2018, 8:08 am

≫ Next: windows 2016 cluster QuarantineThreshold

≪ Previous: Windows AD user password reset details through powershell script!!!

Hi fellow Engineers.

I am currently investigating an annoying issue on a virtual (!) WFC (v2016) that is being used as a 4-node HA FileServer (2node HA in each datacenter). Storage (2x 5TB) is being replicated succesfully (synchronous/write-ordered) between the 2 datacenters.

The annoying issue, is that every 4 hours (randomized within a few minutes) all connected users experience a short freeze of a few seconds up to a minute when accessing the Fileserver. Looking at the logs and StorageReplica known issues, it is clear this is due something trying to create a VSS Snapshot of the Replica LOG volume (which you should not do !!!), and the culprit seems to be an internal mechanism trying to create a Cluster Config Backup - including VSS snapshot of all local volumes of the Role owner.

If I switch the Role to another node, the issue just follows so it is not tied to the Cluster owner, but the role owner !

There is no backup being scheduled at that time, and I have no idea what would create an automatic VSS of all connected volumes.

Before going into details ... I have troubleshooted the hell out of this thing and cannot find it ... I do have some ideas, but the timestamps do not match.

Environment:

Lenovo blades (x240) with VMWare 6.5u1 (was also present on 6.5)

Dell Compellent Storage

3x 1Gbit uplinks (VMXNET3) per Node

Veeam Backup & Replication (9.5u2) using the latest Veeam Agent so we are not using the VMWare API for backup. When the Veeam Agent Backup schedule runs, the issue is not present as only the datavolumes are being backed-up (using VSS).

Anyone else having the same issue or seen this issue ?

↧

windows 2016 cluster QuarantineThreshold

November 2, 2018, 10:24 am

≫ Next: Clustering windows server 2016 Data Center

≪ Previous: Stretch Cluster / Storage Replica / Log volume VSS Snapshots due to built-in Cluster Config Backup ?

https://blogs.msdn.microsoft.com/clustering/2015/06/03/virtual-machine-compute-resiliency-in-windows-server-2016/

QuarantineThreshold means Number of failures before a node is Quarantined. I have some questions

1. which kind of error will count as "failures?

2. I see cluster log there something about Quarantined:

"the node experienced '3' consecutive failures within a SHORT amount of time"

what short is defining here ? 1s ? 2s ? 2 mins ? consecutive SAME KIND of failures ?

↧

Clustering windows server 2016 Data Center

November 30, 2018, 5:11 am

≫ Next: Firewall block traffic of heart beat

≪ Previous: windows 2016 cluster QuarantineThreshold

Hello,

I have been asked to Cluster 2 HP Proliant Servers 380 Gen 10.

On both servers Raid 1 (2 first Disk) and Raid 5 (three last disk ) have been configured.

I have already installed windows server 2016 Data Center on both of them and Hyper V also installed.

The ILOs have been configured.

I need help to configure cluster with Server 1 and server 1 and connect them physically for full redundancy on these 2 Cisco 3850 Switches.

I need heart beat configured.

Can someone help me or direct me to a forum or blof where I can follow the step by step process to do the clustering and physically connect these devices.

Just a side note that later there will be a SAN added and it will be connected to both of those switches.

Thanks a lot

Abhi

↧

Firewall block traffic of heart beat

November 30, 2018, 7:29 am

≫ Next: Clustering between a physical server DASD and a VM

≪ Previous: Clustering windows server 2016 Data Center

I want to do some test in cluster to lost heartbeat, I want to use windows firewall to perform this .

what port I should block ?

I tried block UDP and TCP 3343, but seems still not working .

↧

Clustering between a physical server DASD and a VM

November 26, 2018, 7:27 pm

≫ Next: Guys caw we get the RegKey back for the passing WSSD on the windows server SSD cluster please ...

≪ Previous: Firewall block traffic of heart beat

Scenario :

Physical Server (PS1) with DASD : WIndows 2012 R2 (Multiple terabytes of data on multiple partitions/drives)

Virtual Server 1 (VS1): Windows 2016 (target server Day 1)

Virtual Server 2 (VS2): Windows 2016 (target server Day 2, once all data is synced with VS1 PS1 will be decommissioned)

In the past I have created clusters but with new shared drives with 0 data on them on day 1.

Now the challenge is to create a cluster without creating a new drive but to use an existing drive on PS1 and replicate the data to VS1 and then later to VS2 (in shared mode) and then decommission PS1.

All this without losing PS1 service availability and not erasing data at the source server.

Luis M Astudillo Freelance Enterprise/Infrastructure Architect and Technology Strategic Planner LinkedIn: www.linkedin.com/in/luisma

↧

Guys caw we get the RegKey back for the passing WSSD on the windows server SSD cluster please ...

November 30, 2018, 7:44 am

≫ Next: How can we move the Quorum Disk from Node1 to Node2 ? - Windows 2012 R2 - Hyper-V Clustering

≪ Previous: Clustering between a physical server DASD and a VM

as per the official Microsoft position on windows server 2019 Data Centre

"...When can I deploy Storage Spaces Direct in Windows Server 2019 into production?

Microsoft recommends deploying Storage Spaces Direct on hardware validated by the WSSD program. For Windows Server 2019, the first wave of WSSD offers will launch in February 2019, in about three months.

If you choose instead to build your own with components from the Windows Server 2019 catalog with the SDDC AQs, you may be able to assemble eligible parts sooner. In this case, you can absolutely deploy into production – you’ll just need to contact Microsoft Support for instructions to work around the advisory message. ..."

regards,

Alex

↧

How can we move the Quorum Disk from Node1 to Node2 ? - Windows 2012 R2 - Hyper-V Clustering

December 3, 2018, 6:08 am

≫ Next: Failover Cluster Manager mmc and 4K display issue

≪ Previous: Guys caw we get the RegKey back for the passing WSSD on the windows server SSD cluster please ...

Hello,

We have created a cluster with 2 Nodes and created a role for File share. There are totally 3 Disks in the cluster, among three we have allocated 1 disk as Quorum Disk.

When Node1 is powered off, all the 3 disks are moving to Node2 automatically. But I would like to know how can we move the Quorum Disk from Node1 to Node 2 when both the nodes are active ?

We can move the 2 Disks from Node1 to Node2 while both the Nodes are Powered On, but through the same option I am unable to Move the Quorum Disk from Node1 to Node2 (Right Click on the Disk -> Move -> Select Node).

Kindly suggest on this !!

Thanks & Regards,

Anoop Nair.

Anoop Nair

↧