Quantcast
Channel: High Availability (Clustering) forum
Viewing all 2306 articles
Browse latest View live

2019 Hyper-V Cluster - Quorum

$
0
0

Hi All,

I just finished setting up a hyper-V cluster in our environment, a 3 node cluster.

since the cluster node count is "Odd" will it still be recommended rather necessary to put a witness (disk) in this case?

what i normally practice is to only put a witness when the cluster counts is "even" (for tie-breaking)

hopefully you can share me your thoughts

Thanks


Disk Sharing - Server 2016 Cluster

$
0
0

I'm sure this is probably a newbie question, but it is so hard to find information.

A person would think the ability to do this would be the most fundamental idea in failover clustering.

I have Failover Clustering set up, everything seems to be working fine. Also have SQL server (2017) clustering set up and everything seems to be working fine. I am able to run SQL queries, etc. from both nodes and a different computer.

The problem is support files. We have spreadsheets and document templates that users need to be able to access. I try to put them on the cluster nodes, but I cannot "Share" any of the folders in Active Directory

The File Server role is installed on both nodes (I call them SQL2 and SQL3). The drive I wish to share is listed in the Failover Cluster Manager as "Available Storage".

I put the drive in maintenance mode, and was able to share a folder on SQL3, but it seems to be shared only on that node (the path is '\\SQL3\HDrive')

In Computer Management,  on SQL2 , I cannot even see the drive in "Sharing." or in File Manager. But it does show up in Cluster Manager...

How can I get others to see the files if SQL3 goes offline?

Or what am I missing?


WSSD vs. Azure Stack HCI certification

$
0
0

A team member and I are having a debate. We want to know if it is "safe" to use the very recently released Lenovo SR635 or SR655 EPYC based servers, for building our own Win2019 Storage Spaces Direct cluster (all cluster components will be Windows certified).

The servers are listed in the Windows Server Catalog as Win2019 with Software-Defined Data Center (SDDC) Premium certification (SR635, SR655).

They are not listed in in the Azure Stack HCI Catalog.

He firmly believes that the systems needs to be in Azure Stack HCI catalog, in order to proceed

I believe that we can use the servers 

  • The S2D Hardware Requirements page, used to state of that only Software-Defined Data Center (SDDC) certification is required (this changed in August ;-[)
  • I look at the Lenovo doc as a list of configs that Lenovo will support (FYI, these servers were they released after the PDF was published)
  • The PDF is not a list of systems that can used for S2D, if we are the one supporting the cluster/solution.

So, which of us I "right"?

Regardless of "right", would you proceed anyways?



Collecting Cluster Performance Data

$
0
0
I’ve been using Windows Admin Center to view performance data as I execute different types of workloads in a Cluster with VMs and record the results. It works well visually, but if I run the workload for a specific amount of time the data can be skewed depending on when the snapshot at the moment was taken when I record those results. I think results that show a high, low, and average number to a counter may be better to compare with other results. Looking to collect the obvious CPU, Memory, IOPS, Latency, Throughput, etc.

I’m looking for an efficient method to collect this from nodes in a cluster running Hyper-V and S2D. Should I run PerfMon on all the nodes, or is there a way to calculate more efficiently using something like Get-ClusterPerformanceHistory, anything else I am missing?


T.J.


Server 2019 - Clustering Issue with UDP Port 3343

$
0
0

Evening all,

We've recently ran through an in-place upgrade of our Cluster servers from 2016 to 2019 Datacenter edition. It all went very smooth and everything seemed to be working.

However - a week later and suddenly the cluster is failing! A validation report shows that the three nodes cannot talk to each other on any cluster network via UDP Port 3343.

I've opened the necessary ports on the firewall, in addition to what HyperV probably adds - but it's still happening. I've restarted the machines - still happening. I've used Telnet on the servers to test they are open - and they are.

This was working fine as 2016 - so something has gone wrong somewhere.

I'd be grateful if anyone could help or offer advice on what to look for.

Thanks

Gareth

Admin Center hyper-converged cluster error ('There are no more endpoints available from the endpoint mapper.').

$
0
0

Hello! I have  hyper-converged s2d cluster on windows server 2016 nodes. I'm trying manage it whith admin center. Everything was done with help of this article https://docs.microsoft.com/en-us/windows-server/manage/windows-admin-center/use/manage-hyper-converged

But when i'm trying to connet whith admin center to s2d i get the error "Unable to create the "SDDC Management" cluster resource (required)" and int cluster events i recieve this error:

Cluster resource 'SDDC Management' of type 'SDDC Management' in clustered role 'Cluster Group' failed. The error code was '0x6d9' ('There are no more endpoints available from the endpoint mapper.').

 

then

The Cluster service failed to bring clustered role 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

and then

Clustered role 'Cluster Group' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

What i'm done wrong?

Cluster Shared Volumes: How to mask/unmask disk???

Loopback adapter for DR load balancing breaks failover cluster

$
0
0

I am not completely sure whether failover clustering is the right forum for this or whether I should rather post in Exchange. However, I think the root issue is failover clustering as I am experiencing something similar to https://social.technet.microsoft.com/Forums/windowsserver/en-US/7616b0e5-6fb6-4be7-a859-14baa2e9b925/cluster-network-is-partitioned-due-to-loopback-adapter?forum=winserverClustering.

The setup is an Exchange 2019 CU3 DAG on three Windows Server 2019 nodes, which uses a failover cluster without dedicated management port under the hood. What I want to achieve is layer 4 load balancing using direct response, i.e. the load balancer rewrites the MAC addresses of incoming requests to direct them to the three Exchange nodes.

In order to achieve this, I need to add the shared IP of the load balancer to all three Exchange nodes such that they will accept the redirected packets. The only way I know to do this is adding a loopback adapter via the device manager, add the IP, set its subnet mask to 255.255.255.255 to prevent it from being advertised via ARP and enable weak send and receive.

The setup itself is working, so I can access the Exchange services via the load balancer, but as in the link above, the failover cluster breaks after a short period of time making everything inaccessible. If I disable the loopback adapter, the network partitioning disappears and the cluster is up again.

I found some discussions on this issue which emphasised on the importance of ensuring the right order of adapters. In my understanding, the interface metric is the only way to do this on Server 2019, so I set the loopback adapter to 1000 on all machines. Initially, I also had IPv6 enabled, but was unable to tell the failover cluster not to use the loopback adapter in this case. Although I set the prefix length to 128, it still showed up with its link-local address. I also tried to tell the cluster to explicitly not use it using (Get-ClusterNetwork "Loopback network").Role = 0 as described on https://blogs.technet.microsoft.com/askcore/2014/02/19/configuring-windows-failover-cluster-networks/, but this command has no effect at all - the role does not change. Only removing IPv6 removes the loopback network from the list of cluster networks, but the network is still partitioned (I think because the shared IP on the loopback is in the same subnet as the physical one if you use the netmask of the physical one).

What am I missing here? There must be a way to configure this, because people seem to make DR load balancing work with Exchange.



Power down AAG cluster each night

$
0
0

We have a non-production SQL AAG. One primary replica and one secondary replica with a continuous sync. Its mainly used for testing. To save some money we would like to shut down all nodes in the cluster each night and bring them back up again in the morning.

I have the scripts to do this – but I wondered if there would be any issues with simply powering down all nodes at once and then bringing them back up again in the morning all at the same time?

There are 3 servers (all Server 2016)– one is the SQL primary replica, one is the SQL secondary replica and there is a quorum server. Will I be causing issues with them all closing down and starting up within minutes of one another or will they be able to work themselves out automatically without any intervention each morning? I'd rather not have a battle each morning fixing the aag!

Thanks!

Upgrading the Network Load Balancing(NLB) cluster from 2008r2 to 2012r2

2016 Server Hyper-V Reverts to .XML files after joining a cluster

$
0
0

Has anyone else noticed a behavior in Windows Server 2016 that it reverts to using the old format .XML files for VM configurations after joining a cluster? In this case the cluster was 2012R2 functional level which may affect it?

The problem we're having is that we had some local VMs on the machines and as soon as the servers joined the cluster, the running machines disappeared from all management. They are still running strangely enough, but they no longer show in Get-VM or in Hyper-V manager. 

So, I RDP into one of them and shut it down and tried to re-import it, but 2016 would not even let me re-import it with the VMCX configuration files, it said 'No Virtual Machines Found' in that folder. I had to re-create it, attached the VHDX and it created the old style XML files.

I'm wondering if maybe this has to do with the functional level, but all the VMs on the cluster have XML files, even ones created with 2016, so I'm thinking it might just be intentional?

Anyone seen this behavior?

Thanks!

Drive on all nodes in SQL Availability Group "Formatted" at the same time (Cluster on Windows 2016 standard)

$
0
0

We have a 2 node SQL Availability Group on a Windows 2016 Std Cluster.

SQL Server reported the databases suspect after the data drives on both servers appeared to have been formatted.

On one of the servers we found the following events:

Event ID 7036 on 7/26/2019 at 9:37:55AM

Event ID 98 on 7/26/2019 at 9:38:12AM

Event ID 98 on 7/26/2019 at 9:38:13AM

These appear to indicate that the drive was formatted.

We have tested and found that using the Powershell "Format-Volume" command (Run locally or remotely) against one server causes the same drive on both nodes in the Cluster/AG to be formatted.

One possible cause is a server build script has been run with incorrect server details and we are investigating this possibility.

My questions are:

Has anyone experienced drives being "Formatted" simultaneously across nodes in a Clustered SQL AG?

Is the formatting of drives on an Availability Group supposed to affect all nodes? I've not found documentation to explain this.

Storage Spaces Direct: Number of volumes per cluster?

$
0
0

In Planning volumes in Storage Spaces Direct it says:

We recommend making the number of volumes a multiple of the number of servers in your cluster. For example, if you have 4 servers, you will experience more consistent performance with 4 total volumes than with 3 or 5. This allows the cluster to distribute volume "ownership" (one server handles metadata orchestration for each volume) evenly among servers.

How seriously should I take this? 

Can someone quantify the actual real world performance benefit of adhering to this recommendation? Or is it (as I suspect) a more theoretical benefit?

We have used S2D for a while now and I am tending more and more towards just creating as few volumes as possible simply to avoid allocating to much space to one volume, which I will then later need in another volume because volume growth did not go as expected.

Now: If it was easy/possible to shrink a volume, then it would be another matter. But I am not aware of any option for that.

We currently have 3 volumes: A 3-way mirror volume and two single parity volumes. We have just added a 4th server/node and I want to change everything to dual parity (with mirror acceleration) and I am very tempted to just create one single volume or maybe two. Not four.

Thoughts?

Failover Cluster is functioning but errors are generated

$
0
0

Hi All, 

I have a failure cluster that consist of two nodes with storage based on S2D.

Every day I'm getting errors:

1205 - The Cluster service failed to bring clustered role 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

1069 - Cluster resource 'File Share Witness' of type 'File Share Witness' in clustered role 'Cluster Group' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. 

1564 - File share witness resource 'File Share Witness' failed to arbitrate for the file share '\\SITE2VMHOST01\WITNESS'. Please ensure that file share '\\SITE2VMHOST01\WITNESS' exists and is accessible by the cluster.

1562 - File share witness resource 'File Share Witness' failed a periodic health check on file share '\\SITE2VMHOST01\WITNESS'. Please ensure that file share '\\SITE2VMHOST01\WITNESS' exists and is accessible by the cluster.

1688 - Cluster network name resource detected that the associated computer object in Active Directory was disabled and failed in its attempt to enable it. This may impact functionality that is dependent on Cluster network name authentication.

Network Name: Cluster Name
Organizational Unit: OU=Servers,OU=HO,DC=company,DC=com
Guidance: Enable the computer object for the network name in Active Directory.

1258 - Cluster network name resource failed registration of one or more associated DNS name(s) because the a DNS server could not be reached.

Cluster Network name: 'Cluster Name'
DNS Zone: 'company.com'
DNS Server: '192.168.5.12,192.168.7.20'

Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

Network Name: Cluster Name
Organizational Unit: OU=Servers,OU=HO,DC=company,DC=com
Guidance: Enable the computer object for the network name in Active Directory.

1254 - Clustered role 'Cluster Group' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

Several comments from me RE some errors:

1. FSW exists and accessible by the cluster (means failover cluster object has full control for the folder configured as FSW).

2. At least one of the DNS server addresses is accessible.

Interestingly the Failover Cluster itself is functioning as intended (except those error messages) and all roles are up and running. 

Just wondering how come the error messages are generated if the Failover Cluster works fine?

Cluster Aware Update

$
0
0

Hi,

I have Windows Server 2012 R2 Cluster having 3 nodes and 15 to 15 VMs over Hyper-V Cluster, Normally for Windows update we use local WSUS. Firstly we download update on each cluster machine, install updates and reboot if required and repeat same procedure step by step for each cluster node.

Can i use CLUSTER AWARE UPDATE mechanism to update my Cluster Nodes, please note that i install security updates, update roll-ups and etc.



Please comments  




modify the virtual machine settings to limit the processor features used by the virtual machine

$
0
0

dears,

i have a hyperv 2 nodes cluster configured.

All virtual machines are on the first node and everything was working fine.

recently im facing issues in the live migration: (virtual machine is using processor-specific features not supported on physical computer host, to migrate this vm to a physical computers with different processors modify the virtual machine settings to limit the processor features used by the virtual machine)

however processors are identical, no changes have been made. All vms are failing with the live migrate from host1 to host2.

validation report shows no errors at all.

i created a vm on host2 and live migrated to host1 it worked.

but all the vms from host1 to host2 are failing.

im facing a serious issue as if host 1 is down my vms wont failover

your expertise is highly appreciated

thank you

All VMs pause when certain nodes own the CSV

$
0
0

Hi.

So ive added 2 nodes to a 6 node Server 2016 Hyper V Cluster. Hardwarewise they are the same servers (dell 730s). At first all looked fine, VMs running on those nodes, can live migrate from and to with no issues. But when one of these two nodes get ownership of the CSV Volume on which the VHDs of the VMs reside, all VMs on the entire cluster stop. Cluster validations returns only minor warnings due to updates. i had pending updates on the cluster when i added these nodes - I updated the two additional nodes when they werent part of the cluster yet and the plan was to do a CAU run when they nodes have joined the cluster. But then it fell flat when one node went into maintanance and switch CSV Ownership over to one of the new nodes).Since then i tested this on the other node as well (on a weekend) and the same happens here.

Can these updates actually be the problem or is there anyother place I need to look into?

SOFS with Storage Spaces cluster

$
0
0

Guys,

I have Win 2012 R2 SOFS cluster on top of Storage Spaces cluster with tiered storage and all. I am planning to upgrade it to Server 2016 but there are less to no info on Storage Space cluster upgrade so though someone who might have done it can help?

Cheers

AlwaysOn WSFC 2016 CNO was moved

$
0
0

Hi All,
I have SQL Server 2016 (EE) with AlwaysOn configured in my environment. During a recent maintenance window I was updating IP Addresses/Names of resources in my WSFC and as a result the CNO was moved from under the Cluster Core Resources to being listed as a resource under AG Name as if it was the listener. This caused this one particular AG to then use the CNO as if it were the listener and made the AG failover to the Synchronous replica as the primary. Not sure how to get the CNO back listed under the Cluster Core Resources and removed from AG w/o breaking my WSFC along with the other AG's. CNO is circled in red below. Any help would be greatly appreciated

Listed under the Roles is the name of each AG. The bottom of screen provides more details on each AG. However this one particular AG has 2 resources listed under the Server Name. 1 is the AG Listener and the other resource is the (CNO) cluster named object. Not sure how the cno got included as AG Server Name Resource but i need to remove CNO from AG and get it back listed under Cluster Core resources

Listed under Roles is the name of each AG. The bottom half of screen provides additional details for each AG Role. This particular role has 2 resources listed under Server Name 1 is the AG Listener and the other is the (CNO) Cluster Named Object


KrisT_SQL


SOFS with Storage Spaces cluster

$
0
0

Guys,

I have Win 2012 R2 SOFS cluster on top of Storage Spaces cluster with tiered storage and all. I am planning to upgrade it to Server 2016 but there are less to no info on Storage Space cluster upgrade so though someone who might have done it can help?

Cheers

Viewing all 2306 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>