Quantcast
Channel: High Availability (Clustering) forum
Viewing all 2306 articles
Browse latest View live

File Server Cluster CNO

$
0
0

Hi,

Currently we have a 2 node Cluster on Windows 2008R2, this runs 10 File Server Services having users shares. Each of this File Server service runs on a disk presented from SAN.

Each of the File Server Service has a respective Virtual Computer Object (VCO) under the computer's OU.

The Cluster name object (CNO) has few permissions defined on each of the VCO's.

----------

Last week we migrate the Clustered File Servers to  new Cluster running Windows 2012R2. So a cluster copy wizard was run on the new cluster and the configuration was copied from Windows 2008R2 to Windows 2012R2.

The SAN Volumes were dismounted on the old cluster and remounted on the new cluster. 

The old cluster was shutdown and the new cluster was brought online and all the resources came online as well. Now if users try to access the shares on the file server they were getting lot of errors."Target name incorrect" etc.. 

The event logs on the new cluster recorded events 1207, 1193, 1228, 4. Most of them pointed to the CNO not having permissions on the VCO objects.

So when i checked the security of the VCO's in AD , i still see the CNO of the Old Cluster and not the new cluster. Should i manually add the new cluster CNO to each of these VCO's. -If yes then the CNO appears as a computer object, is this correct.

What else am i missing here.

For now we've reverted back to Windows 2008R2 Cluster and everything is working  as expected. We're shortly have to attempt this again.

I can share more details if needed. 

Reg,

Darshan


Darshan


Error 13 from ResourceControl for resource Disk Drive while adding cluster disk

$
0
0

Hi,

I have a drive mounted at C:\mountpoint\Kdrive. C:\ is not a cluster disk. I am trying to use Cluster API to add this disk to the cluster but it fails with the following errors:-

00000928.00000ce8::2016/03/10-04:59:51.637 INFO  [RCM] rcm::RcmApi::CreateResource: (SQL Server (MSSQLSERVER), Disk Drive C:\mountpoint\KDrive\, 8836dfef-fa51-419d-960f-75965fed6cfd, Physical Disk)
00000928.00000ce8::2016/03/10-04:59:51.637 INFO  [RCM] rcm::RcmGum::CreateResource(Disk Drive C:\mountpoint\KDrive\,8836dfef-fa51-419d-960f-75965fed6cfd,SQL Server (MSSQLSERVER))
00000304.00000554::2016/03/10-04:59:51.678 ERR   [RES] Physical Disk <Disk Drive C:\mountpoint\KDrive\>: Open: Unable to get disk identifier. Error: 5023.
00000928.00000dc8::2016/03/10-04:59:51.678 INFO  [RCM] HandleMonitorReply: OPENRESOURCE for 'Disk Drive C:\mountpoint\KDrive\', gen(0) result 0.
00000304.00000554::2016/03/10-05:00:12.208 ERR   [RHS] Error 13 from ResourceControl for resource Disk Drive C:\mountpoint\KDrive\.
00000928.00000ce8::2016/03/10-05:00:12.208 WARN  [RCM] ResourceControl(SET_PRIVATE_PROPERTIES) to Disk Drive C:\mountpoint\KDrive\ returned 13

I tried with various syntax for the Disk Drive path (with single \ and double \\) but nothing works. If I execute the same code with path like K:\ it works fine.

Code snippet:

try
 {
  // Create the resource.  The resource name is "Disk Drive @:"
  // where @ is the drive letter of a disk partition.
  bstr_t bstr;
  UTIL_Utf8ToWideChar (szDiskPath.data(), bstr);
  int length = bstr.length ();
  lpstrDiskPathW = new WCHAR[length + 1];
  wcsncpy (lpstrDiskPathW, (const wchar_t*)bstr, length);
  lpstrDiskPathW[length] = L'\0';

  String strResName = "Disk Drive " + szDiskPath;
  UTIL_Utf8ToWideChar(strResName.data(), bstr);
  length = bstr.length ();
  lpstrResourceNameW = new WCHAR[length + 1];
  wcsncpy (lpstrResourceNameW, (const wchar_t*)bstr, length);
  lpstrResourceNameW[length] = L'\0';

  hResource = m_funcCreateClusterResource(hClusterGroup,
   (LPCWSTR)lpstrResourceNameW,
   L"Physical Disk",
   0);

  if( hResource == NULL )
  {
   m_log.error("CreateDiskResource: failed to create disk resource %s", strResName);
   throw -1;
  }
  else
  {
   m_log.info("CreateDiskResource: created disk resource %s", strResName);
  }

  // Set the diskpath private property
  // Begin property list used to set the DiskPath private property.
  WCHAR szPropName[] = CLUSREG_NAME_PHYSDISK_DISKPATH;

  typedef struct _DiskPathControl
  {
   DWORD dwPropCount;
   CLUSPROP_PROPERTY_NAME_DECLARE(PropName,sizeof(szPropName)/sizeof(WCHAR));
   CLUSPROP_SZ_DECLARE(DiskPathValue, sizeof(lpstrDiskPathW)/sizeof(WCHAR));
   CLUSPROP_SYNTAX Endmark;
  } DiskPathControl;

  DiskPathControl DPC;

  //  Property Count
  DPC.dwPropCount = 1;

  //  Property Name
  DPC.PropName.Syntax.dw  = CLUSPROP_SYNTAX_NAME;
  DPC.PropName.cbLength   = sizeof( szPropName );
  wcsncpy (DPC.PropName.sz, (const wchar_t*)szPropName, DPC.PropName.cbLength);

  //  Property Value
  DPC.DiskPathValue.Syntax.dw = CLUSPROP_SYNTAX_LIST_VALUE_SZ;
  DPC.DiskPathValue.cbLength  = sizeof( lpstrDiskPathW );
  wcsncpy (DPC.DiskPathValue.sz, (const wchar_t*)lpstrDiskPathW, DPC.DiskPathValue.cbLength);

  //  Endmark
  DPC.Endmark.dw = CLUSPROP_SYNTAX_ENDMARK;

  DWORD cbSize = sizeof( DiskPathControl );

  //  End property list creation

  // Set the diskpath private property
  dwRC = m_funcClusterResourceControl( hResource,
   NULL,
   CLUSCTL_RESOURCE_SET_PRIVATE_PROPERTIES,
   ( void* ) &DPC,
   cbSize,
   NULL,
   0,
   NULL );

  if( dwRC != ERROR_SUCCESS )
  {
   String err(dwRC);
   m_log.error("AA_ClusterBase:: CreateDiskResource: failed to set the DiskPath property, error %s", err);
   m_funcDeleteClusterResource( hResource );
   m_funcCloseClusterResource( hResource );
   hResource = NULL;
   throw -1;
  }
 }
 catch (...)
 {
 }

Is there a know limitation with the Cluster API for not supporting disks mounted on mountpoints?

BTW this works fine:

C:\>cluster res “Disk W:\Mount” /priv DiskPath=”W:\Mount”

Thanks,

Aditya

Guest VM simultaneous failover

$
0
0

Hi,

It is a requirement within our environment for certain guest VM's to always be located on the same node of a cluster as each other, so if one is migrated off, the other moves with it. Essentially they need to be "paired".

Can someone please advise on how I can do this?

Regards

Leon

Cluster VMs sometime fail while doing an export-vm

$
0
0

I'm using a powershell script to export some Clustered (2012 R2 Hyper-V) VMs through task scheduler. 

Every now and then a VM is restarted by cluster during the export-vm. The errors found from the event viewer are located in the end of this message. I cannot see a proper cause for the failure, is there anyway to debug this problem more deeply?

I would also like to know if there is some switch I could set on the cluster-resource, while doing the export-vm, to prevent cluster from trying to restart the VM, even if it is not responding for a while during the export-vm.

The powershell script used:

$VMS = get-vm -Name VM1,VM2,VM3 -EA SilentlyContinue
    foreach ($VM in $VMS.vmname) {
   del \\fileserver\HyperVexport\$VM -force -recurse
        Export-VM -Name $VM -Path \\fileserver\HyperVexport
if ( $? -ne "True" )
{
$date = get-date -format s
"$date $VM Export failed" | out-file -FilePath c:\hyper-v\scripts\ExportVMs.log -Append
send-mailmessage -from "xxx@xx.xx" -to "xxx@xx.xx" -subject "Export of $VM in $env:COMPUTERNAME failed" -smtpServer mailserver
}
    } #close foreach

Event logs from the Hyper-V host where the VM is running at the time of the failure:

TimeLogEvent-IDDescription
22:56:07Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state Online to state ProcessingFailure.
22:56:07Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state ProcessingFailure to state WaitingToTerminate. Cluster resource 'Virtual Machine VM1' is waiting on the following resources: .
22:56:07Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state WaitingToTerminate to state Terminating.
22:56:07Windows Logs/System1069Cluster resource 'Virtual Machine VM1' of type 'Virtual Machine' in clustered role 'VM1' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
22:57:07Applications and Services Logs/Microsoft/Windows/Hyper-V-High-Availability/Admin21128Virtual Machine VM1' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
22:57:07Applications and Services Logs/Microsoft/Windows/Hyper-V-High-Availability/Admin21119Virtual Machine VM1' succesfully started the virtual machine during the resource termination. The virtual machine.
22:57:13Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state Terminating to state DelayRestartingResource.
22:57:13Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state DelayRestartingResource to state OnlineCallIssued.
22:57:13Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state OnlineCallIssued to state OnlinePending.
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin14070 Virtual machine 'VM1' (ID=9510686F-BE3C-4CAA-99A5-EB756ED8DED1) has quit unexpectedly.
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin15190VM1' failed to take a checkpoint. (Virtual machine ID 9510686F-BE3C-4CAA-99A5-EB756ED8DED1)
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin15140VM1' failed to turn off. (Virtual machine ID 9510686F-BE3C-4CAA-99A5-EB756ED8DED1)
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin18350Export failed for virtual machine 'VM1' (9510686F-BE3C-4CAA-99A5-EB756ED8DED1) with error 'The process terminated unexpectedly.' (0x8007042B).
22:57:17Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state OnlinePending to state Online.
22:57:17Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1201The Cluster service successfully brought the clustered role 'VM1' online.


In the VM1 event-viewer I can only see the "The previous system shutdown at ... was unexpected", so it was forcefully shutdown as can be seen from the logs above.
 

Windows Server 2012 Standard,Cluster CSV,Storage migration error--can't find network name

$
0
0

architecture

fiber san switch not showed in it.

I use failover cluster manager to  move a VM's storage from one volume to another,but here comes a error.

In Cluster Storage,CSV not showed but an error.

The discription right to the error icon in Chinese means "Can't find network name."

I almost checked everywhere in failover cluster manager and eventlog,found nothing help.

Does anyone came accross same problem or have any idea about this error?

Cluster Adding Warning - windows 2008 R2 - 3 Node Cluster (Production System)

$
0
0

hello friends.. i am adding one node to my already running cluster. currently i have IBM servers on intel platform and addding one HP node on intel platform.. after adding the node getting below warning... please can you help..


 
Thanks, Happiness Always
Jatin


Failover/Failback settings for WSFC/AlwaysOn in Multi-Subnet envoronment

$
0
0

Hi All,

I'm new to WSFC/AlwaysOn. What production settings do you use for Failover/Failback, Failover threshold, network thresholds for WSFC/AlwaysOn in Multi-Subnet in different geographies to guard for network connectivity issues / site down / etc?

Regards,

ttwa


TTwa

migrate file failover cluster from windows 2003 R2 to a new domain

$
0
0

Hi

My company recently bought another small company. This company had windows 2003 R2 failover cluster and now we have to migrate it to windows 2012 R2 cluster.

What I have done is got 2 new servers joined them to a new domain and configured windows 2012 R2 cluster and this worked good cluster is ready.

The question is how to migrate all the shares from old cluster to a  new cluster. Old and new cluster are connected to EMC Clarion storage so both clusters at this point can see the storage.

I guess I just need to migrate the shares. There is about 15 TB of data so robocopy everything over would be probably overkill.

How would you go about this


Dalibor Bosic


Failover Manager displays VM's as off but are on?

$
0
0

A strange odd thing I'm experiencing right now in my Failover Manger. I have a cluster environment that has 8 host servers. About 61 servers total. I had divided the host servers into 3 dedicated preferred segments (Development, QA, and NET) first 3 host servers were devlopment, with settings restricting them to those server hosts. Then I used 2 host servers that were dedicated for QA. Did the same thing and restricted their setting to those 2 preferred servers. The final set was 3 dedicated for Network testing called NET with the same restrictions to those host servers. After about several years I decided to remove NET and open the cluster group to just Development, and QA. Everything went smoothly but I had several VM's that I had to shut down because of the restriction settings for preferred owners were not able to be removed so I had to shut some of the servers down. Once I did that I was able to remove those settings. That's when all the weird things started to happen to about 6 servers that wouldn't start in the failover cluster. I was getting an error code 0x800713cf which goes back to the restrictions that were placed on the Vm's but those restriction have been removed. I had decided to re-import the VM's with there same ID and brought them into view on the Hyper-V manager for which I was able to start the VM's and everything is looking great. But on the Failover Manager the affected VM's are still showing as being off and still generating that same 0x800713cf even though they are completely on. I had reset the entire cluster group but Failover Manager is still showing them as off? very strange. I was hoping anyone has come across this problem before?

Thank You for your time

Art Gonzales

  

Migration to new node failing - Windows 2008R2

$
0
0

Dear team,

I just added a new host and it looked everything went well.. i am getting dll error as per below .. not sure what to do with it


Is this causing my VM migration to fail.. not sure where do i get the events to see why its failing.. second thing, i have 4 cluster networks and my i can see the land cards of each host in these cluster network but when i added this new node... i am not able to see the LAN card of this particular node in cluster network1 for rest of the cluster networks i can see the specific lan cards added to the cluster networks.



Thanks, Happiness Always
Jatin


Running chkdsk casues Cluster Shared Volumes to go offline

$
0
0

Basic setup at Production site:

2x Dell R730 running Windows 2012 R2 Standard (Windows updates current as of June 2016). Cluster passed Validation tests, with the exception of some warnings due to iSCSI NICs not having a default gateway (those NICs can't ping the LAN interfaces)

3x Cluster Shared Volumes (one Witness, two data) with a single Generic Service role. Generic Service role has both data volumes as dependency. All CSVs are located on a Dell EqualLogic iSCSI SAN group; data volumes are replicated regularly to DR site.

For backups, we have scripted mounting recent replicas on the DR EqualLogic group to a server at the DR site. This setup has been working fine; however, the CSV replicas, when mounted to the DR system, have their filesystems marked as dirty. The filesystems on the Production CSVs are not marked dirty (verified by fsutil and Action Center). This happens occasionally as all replicas/snapshots are taken on the hardware side and the filesystems are not quiesced (which is OK for our purposes), but has been occurring on all replicas this past week.

Issue:

To ensure the CSVs were clean I opted to run a read-only chkdsk from the command line "chkdskX:" on each volume (no switches specified). When I did so, both CSVs were taken offline, which is contrary to my understanding that 2012 R2 CSVs should remain online for analysis and spotfix. On two other servers (not cluster systems), mounting a replica and running chkdsk, the volumes do stay online, so I'm a bit puzzled why the Cluster decided to take the volumes offline, and how to prevent this from happening in the future.

Do I need to specify the "/scan" switch or some other parameter? Does this need to be run through Action Center or Server Manager? Do I need to be running a File Server role in the Cluster?

I appreciate any help and thoughts on this!

Thanks!

Intermittent Live Migration failure generating Event ID 21502, 22038, 21111, 21024

$
0
0

We have a multi node Hyper V Cluster that has recently developed an issue with intermittent failure of live migrations.

We noticed this when one of our CAU runs failed because it could not place the Hosts into maintenance mode or successfully drain all the roles from them.

Scenario:

Place any node into Maintenance mode/drain roles.

Most VM's will drain and live migrate across onto other nodes.  Randomly one or a few will refuse to move (it always varies in regards to the VM and which node it is moving to or from).  The live migration ends with a failure generating event ID's 21502, 22038, 21111, 21024.  If you run the process again (drain roles) it will migrate the VM's or if you manually live migrate them they will move just fine.  Manually live migrating a VM can result in the same intermittent error but rerunning the process will succeed after one or two times or just waiting for a couple minutes.

This occurs on all Nodes in the cluster and can occur with seemingly any VM in the private cloud.

Pertinent content of the event ID's is:

Event 21502
Live migration of 'VM' failed.

Virtual machine migration operation for 'VM' failed at migration source 'NodeName'. (Virtual machine ID xxx)

Failed to send data for a Virtual Machine migration: The process cannot access the file because it is being used by another process. (0x80070020).

Event 22038
Failed to send data for a Virtual Machine migration: The process cannot access the file because it is being used by another process. (0x80070020).

According to this it would appear that something is locking the files or they are not transferring permissions properly, however all access to the back end SOFS is uniform across all the Nodes and the failure is intermittent rather than consistently happening on one Node. 

Thanks in advance!

Cluster events: Query Incomplete

$
0
0

Hi

When i refresh my Cluster events in failover cluster manager i can se the following displayed in the status bar at the bottom:

"Error opening log on node XXX1.fqdn, XXXX2.fqdn, XXXX3.fqdn"

The eventlog finds "no events" and it says Query incomplete at the top.

backup issue of Hyper-V guest machine

$
0
0

Hello friend,

I am getting an error while trying to take the backup of Hyper-V guest mach. Same host having 2 VM one server vm backup is working fine but second server having VSS error getting...Event id 19050, 14044.


Any beter way to do it?

$
0
0

2 node cluster windows 2008 r2 with node and file witness share
the cluster resource (DHCP) is on cluster disk F
now, we need to decommision the storage which holds file share and cluster disk.
We need to switch file share and cluster disk to new storage.
my plan is to take cluster disk offline and add new one. change the new one to F drive
and copy files from old disk to the new one.
then chanbge quorum from node and file witness share
the cluster resource to node majority; then change it to node and file witness share
(with new storage share)

Any beter way to do it?

Thank you!


Replace a physical disk resource in a Windows 2003 SQL cluster

$
0
0

Hello All.

I am in a situation where I need to replace a SAN disk in windows 2003 SQL cluster with a another disk with larger size. I cannot go for an expansion of the drive.

here is my plan but since this was not tried, I would like to confirm that works fine or any issues as I have read lot about issues with the change in disk signatures

1. Assign new SAN disks to the nodes.

2. Bring up the disk on one of the nodes and assign the temporary drive letter

3. Take the SQL instance offline and Copy over the data from old disk to the newly added disk

4. Remove cluster dependencies and delete the old physical disk resource from cluster

5. Add the new disk as a physical disk resource to the SQL cluster group

6. Configure the dependencies

7. Bring the SQL instance online

Please do advise if there is any harm in switching over the disk as mentioned above.

Thanks in advance

Disaster Recovery - Cluster Configuration

$
0
0

Hi,

i have a 3 node CSV cluster that we use for hyper-v. I am preparing our DR solution but unsure about backing up the cluster configuration. If the worst happened and i had to restore our full Veeam backup on new tin without any of the existing hardware can i quickly restore the setup of our cluster hosts with the exact same configuration quickly using a restoration method of some kind? 

Cheers,


Windows Failover cluster across datacenter. Understanding Quorum

$
0
0

Hi ,

We are planning to have 3 node cluster wherein 2 node will be in primary and 1 node will reside in DR. Node in DR will participate as member of windows cluster extended to DR but will not be an owner of any SQL instances (SQL FCI will be configured on this cluster). Node 3 in DR Will have standalone SQL instances and AG Will be configured between SQL in primary and DR. We are planning to have Disk Quorum (separate LUN will be assigned) will also be configured in Primary Data center. Disk quorum will be shared and accessed by nodes in Primary datacenter.  Just want to check if this configuration has some issues and will cause cluster down situation? If yes, what should be the better solution to avoid cluster down in case in case of any disaster situation. Its windows 2012 R2 standard edition. Any suggestion or inputs will be highly appreciated.

Regards,

 

Why does my cluster name appear to be CLUSTER NAME

$
0
0

Can anyone help? I've setup my failover cluster and I was having issues change the heartbeat TTL. Turns out the my cluster name is "CLUSTER NAME" not the one I thought I'd setup. Why has this happened?

Thanks

Monitoring Server (Opmanager) shows clear/online status for one of the MS SQL Server 2012 on Windows 2012 R2 virtual machines

$
0
0

Environment:-

  • Opmanger monitoring server across multiple WAN connections installed on subnet 10.250.1.xx
  • 3 x MS SQL Server 2012 Enterprise edition installed on two MS Windows 2012 R2 virtual machines on clustered environment
  • 1 x MS SQL Server is on 10.15.16.x subnet
  • 2 x MS SQL Server is on 10.15.18.xx subnet

No Issue:-

  • No issue from monitoring application to SQL Server on 10.15.16.xx subnet the status shows "Online"
  • No issue from monitoring application to any other server on subnet 10.15.18.xx submnet

Issue:-

  • One on the SQL server on 10.15.18.xx subnet
  • Monitoring application shows "Online" status for one of SQL server. If I restart SQL server with "critical" status updates to "online" after restart and other SQL server with "online" status changes to "critical"
  • Basically one of the SQL Server on cluster environment  subnet 10.15.18.xx is always showing on "Critical" status by monitoring application server
  • I can ping from server status "online" both direction
  • I cannot ping from server status "critical" both direction
  • I can trace from server status "online" both direction
  • I cannot trace from server status "critical" both direction
  • No errors into event logs

I have done my troubleshooting and also posted on opmanger forums with no luck on resolution

I believe it is more cluster issue when service restarts

Any idea on resolution please?


Muhammad Mehdi

Viewing all 2306 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>