Quantcast
Channel: High Availability (Clustering) forum
Viewing all 2306 articles
Browse latest View live

NLB Problem on Windows Server 2008 R2

$
0
0

Hello,

Exchange CAS array (NBL) has been already configured with node CAS node. When i was trying to add third node, in NLB console showing Misconfiguration. Then i delete 2nd node which was working well then i was trying to add third node but same issue and when it showing misconfiguration then the NLB NIC got Ping loss. NLB was configured in unicast mode.

All the server is in physical environment

The error below:


Processing update 19 from "NLB Manager on abc.bcd.com"
Starting update...
Going to modify cluster configuration...
Modification failed.

Update failed with status code 0x8004100a.

Please suggest


WFCM Is Not Restarting a Process After Exit

$
0
0

I have a two node cluster for availability purposes and find that it works quite well for the most part.

I do however, have an issue where a process will shut itself down with a clean exit after an exception.  WFCM continues to show the service "online" even though services.msc show the service has stopped.

Any ideas what is going on?

WIMMount (HSM) causing cluster storage to go redirected (2012r2 DC)

$
0
0

Looking for options to resolve this error and prevent it in the future.

Thanks for any help.

Hardware:

2 node Dell HV cluster running2012r2 DC

8 NIC/Ea in Multiplex mode 4x hyper-v 4x hosts

Storage:

2x Synology NAS, accessed through Iscsi to Hosts and cluster

Relevant logs:

Log Name:      Microsoft-Windows-FailoverClustering/Diagnostic
Source:        Microsoft-Windows-FailoverClustering
Date:         
Event ID:      2050
Task Category: None
Level:         Warning
Keywords:      
User:          SYSTEM
Computer:      l
Description:
[DCM] filter WIMMount found at unsafe altitude 180700

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:        
Event ID:      5125
Task Category: Cluster Shared Volume
Level:         Warning
Keywords:      
User:          SYSTEM
Computer:
Description:
Cluster Shared Volume 'Volume1' ('Cluster Disk 1') has identified one or more active filter drivers on this device stack that could interfere with CSV operations. I/O access will be redirected to the storage device over the network through another Cluster node. This may result in degraded performance. Please contact the filter driver vendor to verify interoperability with Cluster Shared Volumes.

Active filter drivers found:
WIMMount (HSM)

PS C:\Windows\system32> fltmc instances
Filter                Volume Name                              Altitude        Instance Name       Frame   SprtFtrs  VlS
tatus
--------------------  -------------------------------------  ------------  ----------------------  -----   --------  ---
-----
CCFFilter             \Device\Mup                               261160     CCFFilter                 0     00000003
CsvFlt                \Device\HarddiskVolume50                  404800     CsvFlt Instance           0     00000003
CsvNSFlt              C:                                        404900     CsvNSFlt Instance         0     00000003
FsDepends             C:\ClusterStorage\Volume1                 407000     FsDepends                 0     00000003
FsDepends                                                       407000     FsDepends                 0     00000003
FsDepends             C:                                        407000     FsDepends                 0     00000003
FsDepends             D:                                        407000     FsDepends                 0     00000003
FsDepends             I:                                        407000     FsDepends                 0     00000003
FsDepends                                                       407000     FsDepends                 0     00000003
FsDepends             \Device\HarddiskVolume50                  407000     FsDepends                 0     00000003
FsDepends             \Device\Mup                               407000     FsDepends                 0     00000003
ResumeKeyFilter                                                 202000     ResumeKeyFilter           0     00000003
ResumeKeyFilter       \Device\HarddiskVolume50                  202000     ResumeKeyFilter           0     00000003
WIMMount                                                        180700     WIMMount                  0     00000000
WIMMount              C:                                        180700     WIMMount                  0     00000000
WIMMount              D:                                        180700     WIMMount                  0     00000000
WIMMount              I:                                        180700     WIMMount                  0     00000000
WIMMount                                                        180700     WIMMount                  0     00000000
WIMMount              \Device\HarddiskVolume50                  180700     WIMMount                  0     00000000
luafv                 C:                                        135000     luafv                     0     00000003
npsvctrig             \Device\NamedPipe                          46000     npsvctrig                 0     00000000
svhdxflt              \Device\HarddiskVolume50                  135100     svhdxflt                  0     00000003

The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted

$
0
0

Server 2012 R2 on beefy PowerEdge R720 dual cluster

CAU worked fine on my cluster long time ago when there was few simple VMs.

As the setup got a big more complicated (more VMs with more disks, more backend iSCSI storage, some VMs in cluster setup), CAU just does not work reliable at all.

So this time I was doing hosts WU by hand (migrated VMs to Host A, updated Host B, restarted, updated again as some updated failed, restarted fresh again, Resume/Do not fail back roles (to be on the safe side...)

Then I selected some running VMs on Host A & Live Migration them back to Host B... at which point Host B (one just updated & freshly rebooted) thrown a fit & killed all VMs that were selected to be moved to it...

"The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue."

Cluster service did restart & did bring the machines up, but that is less then user friendly, had to select them one more time to migrate (and this time they did)

But the whole solution just feels not like Enterprise product (in fact it would not be acceptable even for home use)

Do not expect any miracle solution, but if anybody has any experience to chip in, it would be appreciated

Seb

DTCProxy is not running: java.net.ConnectException: Connection timed out

$
0
0

Hi All,

While starting the jboss server we are facing below issue on MSDTC. The DB used is SQL Server 2008 r2. This is a clustered DB environment and MSDTC is working fine on non-clustered environments.

2015-06-09 23:48:18,444 ERROR [STDERR] (main) javax.transaction.xa.XAException: DTCProxy is not running: java.net.ConnectException: Connection timed out

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.b.a(Unknown Source)

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.b.start(Unknown Source)

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.e.start(Unknown Source)

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.start(XAManagedConnection.java:213)

Please help.

can not fix corrupt system file

$
0
0

even I run the dism online restore, there have still have a corrupted file can not fixed, please help

2015-06-22 21:38:39, Info                  CBS    This session already attempted mapping cache rebuild, skip.
2015-06-22 21:38:39, Info                  CBS    Failed to find package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5 from the index with mapping index packages recently rebuilt,  [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to get WU category/updateID for package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5 [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to get the mapping of package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5, continue. [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to find  [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to collect payload and there is nothing to repair. [HRESULT = 0x800f0906 - CBS_E_DOWNLOAD_FAILURE]
2015-06-22 21:38:39, Info                  CBS    Failed to repair store. [HRESULT = 0x800f0906 - CBS_E_DOWNLOAD_FAILURE]
2015-06-22 21:38:39, Info                  CBS    Ensure CBS corruption flag is clear
2015-06-22 21:38:39, Info                  CBS   
=================================

Checking System Update Readiness.

(p) CSI Payload Corrupt   amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\utc.app.json
Repair failed: Missing replacement payload.
(p) CSI Payload Corrupt   amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\telemetry.ASM-WindowsDefault.json
Repair failed: Missing replacement payload.

2015-06-22 20:57:06, Info                  CSI    000008fc [SR] Could not reproject corrupted file [ml:520{260},l:114{57}]"\??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings"\[l:24{12}]"utc.app.json"; source file in store is also corrupted
2015-06-22 20:57:06, Info                  CSI    000008fd Hashes for file member \??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings\telemetry.ASM-WindowsDefault.json do not match actual file [l:66{33}]"telemetry.ASM-WindowsDefault.json" :
  Found: {l:32 b:ErEvcGxrC5RD30CwVgig/0sasSdfpRLjd18ZiXseYV4=} Expected: {l:32 b:EeQJzlVPvq9GNIcA2FEwrOjEeuDam1G+ol3x61gKasQ=}
2015-06-22 20:57:06, Info                  CSI    000008fe Hashes for file member \SystemRoot\WinSxS\amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\telemetry.ASM-WindowsDefault.json do not match actual file [l:66{33}]"telemetry.ASM-WindowsDefault.json" :
  Found: {l:32 b:ErEvcGxrC5RD30CwVgig/0sasSdfpRLjd18ZiXseYV4=} Expected: {l:32 b:EeQJzlVPvq9GNIcA2FEwrOjEeuDam1G+ol3x61gKasQ=}
2015-06-22 20:57:06, Info                  CSI    000008ff [SR] Could not reproject corrupted file [ml:520{260},l:114{57}]"\??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings"\[l:66{33}]"telemetry.ASM-WindowsDefault.json"; source file in store is also corrupted



Windows NLB - Multicast

$
0
0

Hi,

 I configured windows NLB in multicast mode and gave the MAC to network team for adding a static ARP entry in switches, i am wondering whether this MAC id is dynamic or it will stay the same untill i change the mode, or should i go for IGMP mode

same subnet but cluster places in separate networks

$
0
0

Hello,

I have an issue where a host with IP configuration that is on the same subnet as the rest of the hosts keeps getting placed into a separate network. Failover validation doesn't report any issues, it can communicate with all of the other hosts on that network.

Checking the cluster logs I see the event where it's occurring:

INFO  [ClNet] Adapter Hyper-V Virtual Ethernet Adapter #3 is still attached to network Cluster Network 1.

And here is the event where it skips attaching to the right network:

INFO  [ClNet] Ignoring configuration entry for cluster network Public (8d603185-4e36-4222-9c92-be4e3a22ac1e) because it has no previous matching adapter. Processing has not yet completed so an adapter may still be found for this network.

Any idea what can be causing this?


Network Drops for 30 seconds During Hyper-V Live Migration

$
0
0

I have 3 physical Hyper-V hosts setup with clustered storage. I disabled VMQ because I was getting errors when trying to do live migrations. I have also ran the network portion of the cluster validation tests without errors. What happens is basically when I do a live migration from any host to any other host I lose network connectivity to any VM running on those hosts. During this time I have a SQL application that is running and locks up and freezes all the users. Many will have to use task manager to kill the application to get back in or even reboot their machines to free it up.

I have been doing a ton of reading on network settings and configurations and have made no progress. Any help to point me in a direction to get this solved will be appreciated. I need to be able to do Live Migrations on my cluster storage.

Thanks for any help.


Name Resolution Not Yet Available

$
0
0

Hi

We have a Server 2012 2 Node Cluster, the purpose of the cluster is ot run the Scale Out File Server role and this role appears to be online and functioning normally.

However we have an error on the underlying cluster which is called "CBStore" whereas the domain network is shown as "Failed" and there is a message that says "Name Resolution Not Yet Available" - I have attached a picture of this to demonstrate the problem.

We are not able to ping "CBSTORE" or it's IP address.  We did have the Windows Firewall turned on each Cluster Node and have tried turning it off entirely to rule it out but this has not had an affect on the problem.

The Cluster was created 18 months and we have been using it OK since, I am not sure when this error message started happening as I rarely have cause to connect to the Cluster and check it out.

The name appears to be registered in DNS OK.  I have done some research and most answers to the "Name Resolution Not Yet Available" problem state there is an incorrect HOSTS file entry but the HOSTS entry on either Cluster Node does not have any data (other than default) inside it.

Any suggestions welcome!

MS Cluster File Servers, disconnect public network on one node will not trigger fail over?

$
0
0

Hi All, I'm new to MS Cluster. Recently, I setup a Cluster File Server by two nodes. Everything works fine unless the network adapter. I found that the failover is failure if I unplug the network cable (connected to our client access network) on ServerA. However, it work fine if I power off ServerA manually to simulate a power failure. Anything I missed in the configuration? thank you very much!

Shadow Copies on 2012 R2 File Server Cluster

$
0
0

Hello all!

I've inherited a two node physical 2012 R2 file server cluster that contains a few SMB shares on a single clustered disk.  I'd like to enable shadow copies for this shared disk but want to store the shadow copy data on its own shared disk as per shadow copy best practices (at least on non-clustered file servers).

I created a second cluster disk, assigned it to the same resources as the SMB clustered disk. 

Now historically I've enabled shadow copies through computer management but I want to ensure the shadow copies are cluster aware so in the Failover Cluster Manager I open Storage | Disks and right click the SMB cluster disk and click the Shadow Copies tab. From the tab I can Enable shadow copies, this sounds like what I'm looking for, unfortunately it does not give me an option to choose a disk/volume to store my shadow copy data, so this can't be right.

My next step was to connect computer management to the cluster virtual server name (NOCFS4) and through the System Tools | Shared Folders | All Tasks | Configure Shadow Copies it shows me the correct number of shares on the SMB cluster disk plus I can see the Settings button for configuring the location and size limit for shadow copies, however once I tweak the shadow copy location and size settings, click Ok and click Enable to turn on the shadow copies I get a long pause and an error about not being able to create a schedule. So it seems connecting to the cluster virtual object is not the answer either.

That leaves using computer management to connect to one of the physical server nodes of the cluster. When I open the shadow copy interface on the physical node I note that it shows a 0 for the number of shares it detects on the SMB clustered disk. This doesn't surprise me since this interface isn't cluster resource aware.

So I'm stuck. Does anyone know the "Microsoft way" to enable shadow copies on a clustered disk while storing the shadow copy data on a second cluster disk both attached to the same cluster resource?

AlwaysOn Cluster reboot due to file share witness unavailability

$
0
0

Hi Team,

Anyone came across this scenario in AlwaysOn Availability Group (two node), file share witness times out and RHS terminate and cause the cluster node to reboot. File share witness is for continuous failover and if the resource is unavailable my expectation was that it should go offline and should not impact Server or Sql Server. But its rebooting the cluster node to rectify the issue.

Configuration

Windows Server 2012 R2 (VMs) - two node, file share witness (nfs)

Sql Server 2012 SP2

Errors

A component on the server did not respond in a timely fashion. This caused the cluster resource 'File Share Witness' (resource type 'File Share Witness', DLL 'clusres2.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.

The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.

Thanks,

-SreejitG

Bitlocker protected cluster disks fail to mount

$
0
0

Hello

I am trying to Bitlocker protect some CSVs.

I am following the instructions here:

https://technet.microsoft.com/en-gb/library/dn383585.aspx

I can format and prepare the disk, and enable Bitlocker sucesfully. The disk is accessible on the host and I can lock and unlock the drive (with the recovery key) without an issue. I have correctly added the CNO to the list of protectors for the volume.

As soon as I add the disk to the cluster however, the disk fails to come on line in the cluster with the following error:

The system cannot open the device or file specified. Error Code 0x8007006e.

I am using Windows Server 2012 R2 (Core running Hyper-V). The hosts are connected to a Dell Equallogic PS4100 iSCSI array. I have tried with both thin and thick provisioned SAN volumes.

Many thanks

Ben

Issue with setting up a File Server Role error 1254,1205 and 1069

$
0
0

Hi Everyone

I am current building a new 2012R2 file cluster to replace our 2008R2 file cluster

on each node (total of 3) I have enabled the following roles and features

2 Nic Internal and heartbeat

heartbeat network for cluster only  

"File Server roles, failover clustering features, File Server Resource management tools and Share and Storage management tools"

I have mapped 2 Luns to the nodes

Lun 1 quorum

Lun 2 File Storage

both Luns can access by all the nodes 

during the creation of the cluster complete successfully without any error

In configure role High Availability wizard >  File Server > File Server for general use

In Client Access Point I specify the NetBIOS name and  IP address

Select the available cluster Disk2 > in the next wizard screen "you are ready to configure high availability for file cluster screen" > Next > Finish

I can see the role service create successfully how ever I can't see "test" account object create in the OU.

As you can see status show "Failed"

I am able to move the share cluster disk to another node.

Add Share option also grey out

Please help

Many thanks  

Clustered role 'Test' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

The Cluster service failed to bring clustered role 'Test' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Cluster resource 'Test' of type 'Network Name' in clustered role 'Test' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Storage:
Cluster Disk 2
Network Name:
test
OU:
OU=FileCluster,OU=Servers,OU=,DC=,DC=,DC=
IP Address:
Started
25/06/2015 9:53:47 p.m.
Completed
25/06/2015 9:53:49 p.m.
Creating the group test.
Creating File Server resources.
Configuring the cluster storage device.
Configuring File Server resources.
Configuring File Server networking.
Verifying the client access point settings are valid.
Configuring network name resource.
Configuring new IP address resources.
Configuring the dependencies for the IP address resources.
Configuring the network name dependencies.
The client access point has been configured successfully.
Configuring File Server resources.
Creating the highly available file server resource.
Verifying required dependencies are configured.
A File Server has been successfully created.





Multi-site cluster with different connections

$
0
0

Hello,

At the moment we have a sql 2014 cluster in one datacenter where all of our customers connect to

In the near future we are going to expand to a second datacenter and we want to move 1 node to this datacenter and create a DMDW connection in between.

In this second datacenter we also want to let customers connect to, but different customers. So the customers connecting to the first datacenter will not connect to the second datacenter. 

The customers connecting to the second datacenter will be routed via the DMDW connection to the first node in the first datacenter.

Now my problem: It could happen that the DMDW connection breaks down and all the customer connections from the 2nd to the 1st datacenter will be lost. Now i want that the customers of the 2nd datacenter connect to the 2nd node and continue to work.

In the setup i have now, that's not possible, because you will create  split brain issue. But how can i make this to work?

Help in creating first Windows Failover Cluster

$
0
0

This is my first attempt in creating a failover cluster. I've followed the instructions on "Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory" along with the required ports.

https://technet.microsoft.com/en-us/library/cc731002(v=ws.10).aspx#BKMK_steps_precreating
http://cybergav.in/2013/07/28/windows-server-failover-cluster-port-requirements-for-intra-node-connectivity/

Added the Failover Cluster Featurs via Manager on both server-dev01 & server-dev02. I ran the validation and all seems to check out.  I then create the cluster, "server-cdev", via "Create Cluster Wizard".  That didn't go well.

I pulled the logs via PowerShell command "Get-ClusterLog" and tried looking at the logs but can't really make heads or tails with it.

There's some things I don't get and I'm hoping you can help me out.
 - Why there are random IPv4 and IPv6 (IPv6 not enabled) on the logs that I know nothing off and not on our DNS records in the logs?
 - Both 137 & 3343 ports are open on our firewall but can't telnet to server-dev01 <- either -> server-dev2 on port 3343 / 137. Does this need to be fixed before running the cluster wizard?
 - I see these two but have no idea what they mean (I did see a similar post but no answer to the question - https://social.technet.microsoft.com/Forums/windowsserver/en-US/9d25c123-a763-405f-8c20-61da2d4b4390/cluster-creation-error?forum=winserver8gen)
[DCM] DiskControlManager bitlocker load status 126
[API] DmQueryString failed to retrieve the security   descriptor status 2, default security descriptor will be used for authorizing client connections

I tried posting the logs here but got an error message about the "Body must be 4 - 60000 characters long"

Your help is greatly appreciated..

Cluster IP address fails with error 1077

$
0
0

Hi all,

I have a windows 2012r2 failover cluster and recently I noticed that the cluster name is offline due to a failure of its IP Address:

Health check for IP interface 'Cluster IP Address' (address '10.16.18.70') failed (status is '55'). Run the Validate a Configuration wizard to ensure that the network adapter is functioning properly.

I've run the validation wizard (only for the network part coz the cluster is in production since some months) and everything is ok. If I try to bring online the IP addrees I receive a failure.

Add Node to Cluster - Keyset does not exist

$
0
0

Hi,

I am trying to add third node to a Windows 2012 fail over cluster, but gets the following error.

The server 'DR.domain.com' could not be added to the cluster.
An error occurred while adding node 'DR.domain.com' to cluster 'domain-fc'.

Keyset does not exist

The User I am using to Add Node is Domain Admin, so it may not be a permission issue.

All nodes are Windows 2012 R2 VMs on Azure


Usman Shaheen MCTS BizTalk Server http://usmanshaheen.wordpress.com


Sharepoint Website responding very slow, using windows server 2012 Network Load Balancing

$
0
0

Hello Team,

Greetings for the day!

I have 2 Windows server 2012, with Network Load Balancing role enabled on it, on both the server sharepoint 2010 R2 (Sharepoint Farm) is installed. I have enabled Network load balancer, with total of 5 IPS assigned between those 2 server.

The sharepoint site is working very slow some time (1min 30 sec), and sometime it respond very quickly(10 sec).

I have verified both the server performance which is more than good.

Me not sure what can I troubleshoot it further, I am also not sure how to check which server the request is going.

Help me coming out of this situation and optimize the performance.


Paresh Jain

Viewing all 2306 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>