OOD and EVEN cluster, looking simple and small definitions

February 6, 2016, 3:59 pm

≫ Next: All VMs crash when we loose Layer 3 connectivity

≪ Previous: Hyper-V 2012 host keeps pausing

Hi fail over clustering means redundancy question here is redundancy what is basic components or definitions for e.g quorum what is quorum, how types of quorum, what are settings used in quorum, which scenario it can be used, difference between OOD and EVEN cluster, how to recover or restore quorum , load balancing, how to configure load balancing, advantage, looking simple and small definitions

↧

All VMs crash when we loose Layer 3 connectivity

February 5, 2016, 7:45 am

≫ Next: Cluster resource online remotely via power-shall scrip

≪ Previous: OOD and EVEN cluster, looking simple and small definitions

Greetings, any chance that you could clear up a SCVMM debate that we have been having for some time. We have a Windows 2012 R2 8 node cluster using SAN storage (System x3850 X5 & Emulex 42D0494, PCI Slot 1, Storport Miniport Driver). + 8 NICs for the VMs spread over 4 switches in a stack. We recently did a reboot of the Juniper stack and all of the VMs in Hyper-V crashed, but the VMWare VMs were all OK. We have been told that because Microsoft uses the LAN to communicate to the LUNs rather than the HBA like VMWare, all of the VMs would crash and this is normal. This is because VMWare own the patents to LUN state communication, so Microsoft has to use the LAN.

Hope this makes sense. I am not talking about the VMQ problem that lots of other people are getting, by the way.
So not this
http://alexappleton.net/post/77116755157/hyper-v-virtual-machines-losing-network

Kind regards, Tony

↧

Cluster resource online remotely via power-shall scrip

February 12, 2016, 5:55 am

≫ Next: Hyper-V cluster - 500VM's + - clussvc.exe - CPU usage

≪ Previous: All VMs crash when we loose Layer 3 connectivity

Hello Guyz,

I would like to have a powershall/VBS script which could help me to bring a cluster resource online/offline via remote server. ( Let's say I've a server located on location A and from there I want to execute a script for location B server to bring any cluster resource in online state. It be highly appreciated if someone could help me out with this.

thanks,

Purushar

↧

Hyper-V cluster - 500VM's + - clussvc.exe - CPU usage

October 1, 2015, 10:40 pm

≫ Next: Storage Spaces turn into Clustered Storage Spaces when creating WFOC, but don't want them to

≪ Previous: Cluster resource online remotely via power-shall scrip

We have multiple hyper-V clusters running on 2012R2

In the largest cluster, 13 nodes, 500 vm's, clussvc.exe uses about 25-30% CPU, and there is a good amount of network traffic between all the nodes. (100mbit), even when the node is running just one idle VM as a test.

when stopping the SCOM agent, the CPU usages for clussvc.exe drops to 15%
the network traffic also drops a-bit.

CSV volumes are running in normal mode, not redirected.
Storage attached via fiber channel.

All the latest Windows updates are installed.

What can be done to reduce the CPU usage?

↧

Storage Spaces turn into Clustered Storage Spaces when creating WFOC, but don't want them to

July 6, 2015, 4:58 pm

≫ Next: Getting Below Error on the time of Network Validation Task (Win2k12)

≪ Previous: Hyper-V cluster - 500VM's + - clussvc.exe - CPU usage

Hi everybody and thank you in advance.

Environment:

I have 2 separate servers (physical) each running Windows Server 2012 R2.
Each one having its own local storage attached via DAS using HBAs (SATA SSD Drives). None of the disks are replicating to the other host, so completely separate. I used Storage spaces to create a storage pool on each server, so the volumes are the sizes I want.
I enabled MPIO for iSCSI devices only
I created a windows failover cluster with no special configuration.

Issue:

The cluster took the storage spaces and made them clustered storage spaces. In server manager the storage pools show that they are available to the cluster, but managed by both servers (both are listed under each one). The pools are not replicated, so of course this presents a problem.

When running validation tests, the cluster takes each node offline and detaches each one of the virtual disk during the test. This is where the problem is occurring as it is detaching virtual disks that are not replicated, so it causes programs installed on these storage spaces to crash instantly.

Question:

How do I remove the storage spaces from being clustered? I have searched the web, searched each cmdlet matching *cluster*, *disk*, removed the cluster and re-added it, created on completely separate hosts which produce the same result. I tried creating the stroage pools after the cluster has been created, but the premordial storage is owned by the cluster even though none of the disk ids match up across the 2 host, so I am lost and frustrated beyond belief. Any advice would trully be appreciated. Searching here and google and everywhere returns "How to...storage spaces" which I'm not looking for how to create storage spaces. Thanks again. Here is a screen shot of how it detaches the virtual disks while running the validation tests.

↧

Getting Below Error on the time of Network Validation Task (Win2k12)

September 23, 2015, 9:44 am

≫ Next: Help needed fixing a broken Windows 2008R2 SQL cluster

≪ Previous: Storage Spaces turn into Clustered Storage Spaces when creating WFOC, but don't want them to

Network interfaces - Heartbeat and - Heartbeat are on the same cluster network, yet address is not reachable from using UDP on port 3343.

Regards

↧

Help needed fixing a broken Windows 2008R2 SQL cluster

September 28, 2015, 4:29 pm

≫ Next: Cluster Network Failed on One Node

≪ Previous: Getting Below Error on the time of Network Validation Task (Win2k12)

We had 2 Windows 2008R2 servers using Windows Clustering Services to create a HA SQL Server instance. They are using iSCSI shared storage. One of the servers failed to rejoin the cluster after a reboot and has been shut down. The failed server appeared to have been removed from the cluster correctly. The remaining server was working fine until it too was reboot and now the cluster won't start.

The Cluster Management recognizes there is still a cluster and that the working server is a member node. The Cluster Service can be started but the cluster itself never initializes. Using Force Cluster Start fails.

I get errors like this:

Node 'SERVERNAME' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

One problem I can see is that the shared iSCSI volumes (Quorum, Data etc.) are mounted on the remaining server but marked as Reserved, Offline, Read-only and Clustered. I assume that to resolve the issue I need to get them online on the remaining working member of the cluster but can't figure out how. Here is the Diskpart output (Disk 0 is the boot drive, Disk 1-4 are the clustered drives):

DISKPART> list disk

Disk ### Status Size Free Dyn Gpt
-------- ------------- ------- ------- --- ---
Disk 0 Online 50 GB 1024 KB
* Disk 1 Reserved 200 GB 1024 KB
Disk 2 Reserved 1545 MB 1984 KB
Disk 3 Reserved 100 GB 1024 KB
Disk 4 Reserved 1545 MB 1984 KB
DISKPART> select Disk 1

Disk 1 is now the selected disk.

DISKPART> attribute disk
Current Read-only State : Yes
Read-only : Yes
Boot Disk : No
Pagefile Disk : No
Hibernation File Disk : No
Crashdump Disk : No
Clustered Disk : Yes

DISKPART> attribute disk clear Readonly

DiskPart failed to clear disk attributes.

DISKPART> attribute disk clear Clustered

The arguments specified for this command are not valid.
For more information on the command type: HELP ATTRIBUTES DISK

DISKPART> attribute disk clear ClusteredDisk

The arguments specified for this command are not valid.
For more information on the command type: HELP ATTRIBUTES DISK

As you can see from the output I can't clear the Read-only or the Clustered Disk state. How do I resolve this?

Thanks,

Daniel.

↧

Cluster Network Failed on One Node

September 22, 2015, 7:03 am

≫ Next: Microsoft Failover Clustering Events 1126, 1127 and 1129. Hearbeat Network Deprecated

≪ Previous: Help needed fixing a broken Windows 2008R2 SQL cluster

Hello Experts,

I am looking for your suggestions on below described issues with Multi-Site MSFC on 2012 R2.

We have 2 Physical Fujitsu Primergy BX920 S4 servers with CNA cards located both at different sites.

We have created 4 x Virtual NIC on each Physical server using OneCommand Manager. further these NICs are teamed.

Now we have 2 x NICs for each server as below.

NIC1 (VLAN-A IP-X.X.12.24), NIC2 (VLAN-B IP-X.X.20.10) for Node1

NIC3 (VLAN-A IP-X.X.13.24), NIC4 (VLAN-B IP-X.X.111.10) for Node2

VLAN-A is configured for Cluster Heartbeat without default Gateway.

VLAN-B is configured for Production and both NICs have Default Gateway.

While looking at Cluster Manager console for Network Component, we observed NIC1 is showing DOWN. and getting event in Event Logs as below.

Cluster network 'Cluster Network 1' is down. None of the available nodes can communicate using this network. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

We have verified the network communication from NIC1 to Switch is working fine. The HeartBeat Network between both node is going through Switch only, there is no firewall.

We have tried to create static routes however both nodes, however while performing tracert to IP-X.X.13.24 from node1, The traffic is going through NIC2 (VLAN-B IP-X.X.20.10), where as it should go through NIC1.

Please suggest how can we troubleshoot it further.

Thanks & Regards,
Amit Katkar (MCITP Windows 2008)
------------------------------------------------------------
This posting is provided "AS IS" with no warranties or guarantees and confers no rights.

↧

Microsoft Failover Clustering Events 1126, 1127 and 1129. Hearbeat Network Deprecated

August 25, 2015, 10:13 am

≫ Next: replacing storage on failover cluster

≪ Previous: Cluster Network Failed on One Node

Environment and problem:

2 servers 2012 in a Microsoft failover clustering which is going to be used for SQL AlwaysON. There is no common storage between the servers, since I do not need one. These are both virtual servers running on ESXi environment 5.5

The servers are using a heartbeat network - 10.0.160.51 (node A) and 10.0.160.52 (node B) . Servers are across datacenters and we are using OTV to extend the heartbeat network from one site to another. The cluster validation report is all Green. But I am getting errors for heartbeat network cannot be reached by another node and heartbeat network partitioned errors - Event ID 1126, 1127 And 1129.

The errors are coming every 10-15 min interval. Here is the cluster.log events at one particular time when I got those errors (11:33:49 and 11:33:54 am - 8/25/2015)

000027b8.0000212c::2015/08/25-11:33:00.029 INFO [NM] Received request from client address TVPPSQLTCMW01A.
000027b8.0000212c::2015/08/25-11:33:04.435 INFO [NM] Received request from client address TVPPSQLTCMW01A.
000027b8.000021b0::2015/08/25-11:33:49.749 DBG [NETFTAPI] Signaled NetftRemoteUnreachable event, local address 10.0.160.51:3343 remote address 10.0.160.52:3343
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] got event: Remote endpoint 10.0.160.52:~3343~ unreachable from 10.0.160.51:~3343~
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Marking Route from 10.0.160.51:~3343~ to 10.0.160.52:~3343~ as down
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [NDP] Checking to see if all routes for route (virtual) local fe80::8cce:1c3c:9d53:51cd:~0~ to remote fe80::805a:5752:fc4a:de9b:~0~ are down
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [NDP] Route local 10.11.137.51:~0~ to remote 10.11.137.52:~0~ is up
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 1: Old: 00.921, Message: Response, Route sequence: 79338, Received sequence: 79338, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:48.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 2: Old: 00.921, Message: Request, Route sequence: 79338, Received sequence: 79338, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:48.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 3: Old: 00.921, Message: Response, Route sequence: 79338, Received sequence: 79337, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:48.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 4: Old: 01.921, Message: Request, Route sequence: 79337, Received sequence: 79337, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:47.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 5: Old: 01.921, Message: Response, Route sequence: 79337, Received sequence: 79336, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:47.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 6: Old: 02.921, Message: Request, Route sequence: 79336, Received sequence: 79336, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:46.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 7: Old: 02.921, Message: Response, Route sequence: 79336, Received sequence: 79335, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:46.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 8: Old: 03.921, Message: Request, Route sequence: 79335, Received sequence: 79335, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:45.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 9: Old: 03.921, Message: Response, Route sequence: 79335, Received sequence: 79334, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:45.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Route history 10: Old: 04.921, Message: Request, Route sequence: 79334, Received sequence: 79334, Heartbeats counter/threshold: 5/5, Error: Success, NtStatus: 0 Timestamp: 2015/08/25-11:33:44.827, Ticks since last sending: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Adding information for route Route from local 10.0.160.51:~3343~ to remote 10.0.160.52:~3343~, status: false, attributes: 0
000027b8.00001f00::2015/08/25-11:33:49.749 INFO [IM] Sending connectivity report to leader (node 1): <class mscs::InterfaceReport>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <fromInterface>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</fromInterface>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <upInterfaces><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </upInterfaces>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <downInterfaces><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </downInterfaces>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <upRoutesType><vector len='0'>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </upRoutesType>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <downRoutesType><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <item>1</item>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </downRoutesType>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <viewId>201</viewId>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO <localDisconnect>false</localDisconnect>
000027b8.00001f00::2015/08/25-11:33:49.749 INFO </class mscs::InterfaceReport>
000027b8.0000212c::2015/08/25-11:33:49.749 INFO [DCM] HandleNetftRemoteRouteChange
000027b8.00002438::2015/08/25-11:33:49.749 INFO [IM] Leader got report from 1
000027b8.000013c4::2015/08/25-11:33:49.749 INFO [DCM] HandleRequest: dcm/netftRouteChange
000027b8.00002438::2015/08/25-11:33:49.749 INFO [IM - Heartbeat Network] 1 reports in state calculator queue
000027b8.00002410::2015/08/25-11:33:49.749 INFO [IM - Heartbeat Network] State calculator got new report from 5ced600f-83b3-4fde-aa4e-f3d2e4f6e584
000027b8.000013c4::2015/08/25-11:33:49.749 INFO [DCM] Forcing disconnect succeeded
000027b8.00002410::2015/08/25-11:33:49.749 INFO [IM - Heartbeat Network] 0 reports in state calculator queue
000027b8.000013c4::2015/08/25-11:33:49.749 INFO [DCM] Skipping client access network d7dcde53-bf75-427e-8083-c2f4166be39c for multichannel
000027b8.000013c4::2015/08/25-11:33:49.749 WARN [DCM] No matching addresses for Netft on this node with id 1
000027b8.000013c4::2015/08/25-11:33:49.749 INFO [DCM] Unregistering name fe80::8cce:1c3c:9d53:51cd for multichannel support returned 0
000027b8.0000212c::2015/08/25-11:33:49.764 DBG [NETFTAPI] received NsiParameterNotification for 10.0.160.51 (IpDadStateDeprecated)
000027b8.0000212c::2015/08/25-11:33:49.764 DBG [NETFTAPI] Signaled NetftLocalDisconnect event for 10.0.160.51
000027b8.00001f00::2015/08/25-11:33:49.764 INFO [IM] got event: Local endpoint 10.0.160.51:~0~ disconnected
000027b8.00001f00::2015/08/25-11:33:49.764 INFO [IM] Informing leader about local disconnect. Endpoint 10.0.160.51:~0~ is disconnected
000027b8.00001f00::2015/08/25-11:33:49.764 INFO [IM] Adding information for route Route from local 10.0.160.51:~3343~ to remote 10.0.160.52:~3343~, status: false, attributes: 0
000027b8.00001f00::2015/08/25-11:33:49.764 INFO [IM] Sending connectivity report to leader (node 1): <class mscs::InterfaceReport>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <fromInterface>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</fromInterface>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <upInterfaces><vector len='0'>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </upInterfaces>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <downInterfaces><vector len='2'>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </downInterfaces>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <upRoutesType><vector len='0'>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </upRoutesType>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <downRoutesType><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <item>1</item>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </downRoutesType>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <viewId>201</viewId>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO <localDisconnect>true</localDisconnect>
000027b8.00001f00::2015/08/25-11:33:49.764 INFO </class mscs::InterfaceReport>
000027b8.00002438::2015/08/25-11:33:49.764 INFO [IM] Leader got report from 1
000027b8.00002438::2015/08/25-11:33:49.764 INFO [IM - Heartbeat Network] 1 reports in state calculator queue
000027b8.00002410::2015/08/25-11:33:49.764 INFO [IM - Heartbeat Network] State calculator got new report from 5ced600f-83b3-4fde-aa4e-f3d2e4f6e584
000027b8.00002410::2015/08/25-11:33:49.764 INFO [IM - Heartbeat Network] Issuing state change update with result <class mscs::InterfaceResult>
000027b8.00002410::2015/08/25-11:33:49.764 INFO <up><vector len='0'>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </up>
000027b8.00002410::2015/08/25-11:33:49.764 INFO <down><vector len='1'>
000027b8.00002410::2015/08/25-11:33:49.764 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </down>
000027b8.00002410::2015/08/25-11:33:49.764 INFO <unreachable><vector len='0'>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </vector>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </unreachable>
000027b8.00002410::2015/08/25-11:33:49.764 INFO </class mscs::InterfaceResult>
000027b8.00002410::2015/08/25-11:33:49.764 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message
000027b8.000027bc::2015/08/25-11:33:49.764 INFO [IM] Changing the state of adapters according to result: <class mscs::InterfaceResult>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO <up><vector len='0'>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </vector>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </up>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO <down><vector len='1'>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </vector>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </down>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO <unreachable><vector len='0'>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </vector>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </unreachable>
000027b8.000027bc::2015/08/25-11:33:49.764 INFO </class mscs::InterfaceResult>
000027b8.00002a64::2015/08/25-11:33:49.764 INFO [DCM] HandleInterfaceChange
000027b8.000013c4::2015/08/25-11:33:49.764 INFO [DCM] HandleRequest: dcm/connectivityCheck
000027b8.00002410::2015/08/25-11:33:49.764 INFO [IM - Heartbeat Network] 0 reports in state calculator queue
000027b8.000013c4::2015/08/25-11:33:49.764 INFO [DCM] Skipping client access network d7dcde53-bf75-427e-8083-c2f4166be39c for multichannel
000027b8.000013c4::2015/08/25-11:33:49.764 INFO [DCM] Sending local node id to node Id 2
000027b8.00002ddc::2015/08/25-11:33:49.764 DBG [NETFTAPI] received NsiDeleteInstance for fe80::5efe:10.0.160.51
000027b8.00002ddc::2015/08/25-11:33:49.780 WARN [NETFTAPI] Failed to query parameters for fe80::5efe:10.0.160.51 (status 0x80070490)
000027b8.00002ddc::2015/08/25-11:33:49.780 DBG [NETFTAPI] Signaled NetftLocalAdd event for fe80::5efe:10.0.160.51
000027b8.00002ddc::2015/08/25-11:33:49.796 WARN [NETFTAPI] Failed to query parameters for fe80::5efe:10.0.160.51 (status 0x80070490)
000027b8.00002ddc::2015/08/25-11:33:49.796 DBG [NETFTAPI] Signaled NetftLocalRemove event for fe80::5efe:10.0.160.51
000027b8.00002448::2015/08/25-11:33:51.905 INFO [CHM] Incoming seq no is better than mine for node 2. Merging data
000027b8.00001428::2015/08/25-11:33:51.905 INFO [CHM] My weights have changed: <vector len='65'>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>111</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO <item>0</item>
000027b8.00001428::2015/08/25-11:33:51.905 INFO </vector>
000027b8.00001428::2015/08/25-11:33:51.905 INFO .
000027b8.00001428::2015/08/25-11:33:51.905 INFO [CHM] Sending route weight vector for nodes (1 2) to nodes (2)
000027b8.00002ddc::2015/08/25-11:33:52.905 INFO [CHM] Sending route weight vector for nodes (1 2) to nodes (2)
000027b8.00002ddc::2015/08/25-11:33:54.046 DBG [NETFTAPI] received NsiParameterNotification for 10.0.160.51 (IpDadStatePreferred)
000027b8.00002ddc::2015/08/25-11:33:54.061 DBG [NETFTAPI] Signaled NetftLocalConnect event for 10.0.160.51
000027b8.00001f00::2015/08/25-11:33:54.061 INFO [IM] got event: Local endpoint 10.0.160.51:~0~ connected
000027b8.00001f00::2015/08/25-11:33:54.061 INFO [IM] Adding information for route Route from local 10.0.160.51:~3343~ to remote 10.0.160.52:~3343~, status: false, attributes: 0
000027b8.00001f00::2015/08/25-11:33:54.061 INFO [IM] Sending connectivity report to leader (node 1): <class mscs::InterfaceReport>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <fromInterface>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</fromInterface>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <upInterfaces><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </upInterfaces>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <downInterfaces><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </downInterfaces>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <upRoutesType><vector len='0'>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </upRoutesType>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <downRoutesType><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <item>1</item>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </downRoutesType>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <viewId>201</viewId>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO <localDisconnect>false</localDisconnect>
000027b8.00001f00::2015/08/25-11:33:54.061 INFO </class mscs::InterfaceReport>
000027b8.00002438::2015/08/25-11:33:54.061 INFO [IM] Leader got report from 1
000027b8.00002438::2015/08/25-11:33:54.061 INFO [IM - Heartbeat Network] 1 reports in state calculator queue
000027b8.00002410::2015/08/25-11:33:54.061 INFO [IM - Heartbeat Network] State calculator got new report from 5ced600f-83b3-4fde-aa4e-f3d2e4f6e584
000027b8.00002410::2015/08/25-11:33:54.061 INFO [IM - Heartbeat Network] 0 reports in state calculator queue
000027b8.00001428::2015/08/25-11:33:54.061 DBG [NETFTAPI] received NsiAddInstance for fe80::5efe:10.0.160.51
000027b8.00002ddc::2015/08/25-11:33:54.077 DBG [NETFTAPI] received NsiParameterNotification for fe80::5efe:10.0.160.51 (IpDadStateDeprecated)
000027b8.000021b0::2015/08/25-11:33:54.827 DBG [NETFTAPI] Signaled NetftRemoteReachable event, local address 10.0.160.51:3343 remote address 10.0.160.52:3343
000027b8.00001f00::2015/08/25-11:33:54.827 INFO [TM] got event: Remote endpoint 10.0.160.52:~3343~ reachable from 10.0.160.51:~3343~
000027b8.00001f00::2015/08/25-11:33:54.827 INFO [IM] got event: Remote endpoint 10.0.160.52:~3343~ reachable from 10.0.160.51:~3343~
000027b8.00001f00::2015/08/25-11:33:54.827 INFO [IM] Marking Route from 10.0.160.51:~3343~ to 10.0.160.52:~3343~ as up
000027b8.00002ddc::2015/08/25-11:33:54.827 INFO [DCM] HandleNetftRemoteRouteChange
000027b8.000013c4::2015/08/25-11:33:54.827 INFO [DCM] HandleRequest: dcm/netftRouteChange
000027b8.00001f00::2015/08/25-11:33:54.827 INFO [IM] Adding information for route Route from local 10.0.160.51:~3343~ to remote 10.0.160.52:~3343~, status: true, attributes: 0
000027b8.00001f00::2015/08/25-11:33:54.827 INFO [IM] Sending connectivity report to leader (node 1): <class mscs::InterfaceReport>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <fromInterface>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</fromInterface>
000027b8.000013c4::2015/08/25-11:33:54.827 INFO [DCM] Skipping client access network d7dcde53-bf75-427e-8083-c2f4166be39c for multichannel
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <upInterfaces><vector len='2'>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </upInterfaces>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <downInterfaces><vector len='0'>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </downInterfaces>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <upRoutesType><vector len='1'>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <item>1</item>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </upRoutesType>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <downRoutesType><vector len='0'>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </downRoutesType>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <viewId>201</viewId>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO <localDisconnect>false</localDisconnect>
000027b8.00001f00::2015/08/25-11:33:54.827 INFO </class mscs::InterfaceReport>
000027b8.00002438::2015/08/25-11:33:54.827 INFO [IM] Leader got report from 1
000027b8.00002438::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] 1 reports in state calculator queue
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] State calculator got new report from 5ced600f-83b3-4fde-aa4e-f3d2e4f6e584
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] 0 reports in state calculator queue
000027b8.00002448::2015/08/25-11:33:54.827 INFO [IM] Leader got report from 2
000027b8.00002448::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] 1 reports in state calculator queue
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] State calculator got new report from 5f354f39-8e3b-4ad1-bb13-96e2e1d75d55
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] 0 reports in state calculator queue
000027b8.000013c4::2015/08/25-11:33:54.827 INFO [DCM] Registering name fe80::8cce:1c3c:9d53:51cd for multichannel support returned 0
000027b8.000013c4::2015/08/25-11:33:54.827 INFO [DCM] Sending local node id to node Id 2
000027b8.00002410::2015/08/25-11:33:54.827 INFO (allSplitGroups, splitGroups) = [IM - Heartbeat Network] Two splits for group (0 1)
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] Calculating equal interface state for groups (0) and (1)
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] Calculated interface state result: <class mscs::InterfaceResult>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <up><vector len='0'>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </up>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <down><vector len='0'>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </down>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <unreachable><vector len='2'>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </unreachable>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </class mscs::InterfaceResult>
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] Issuing state change update with result <class mscs::InterfaceResult>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <up><vector len='0'>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </up>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <down><vector len='0'>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </down>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <unreachable><vector len='2'>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.00002410::2015/08/25-11:33:54.827 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </vector>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </unreachable>
000027b8.00002410::2015/08/25-11:33:54.827 INFO </class mscs::InterfaceResult>
000027b8.00002410::2015/08/25-11:33:54.827 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] State calculation complete without issuing pings
000027b8.00002410::2015/08/25-11:33:54.827 INFO [IM - Heartbeat Network] Resetting interface state calculation state
000027b8.000027bc::2015/08/25-11:33:54.827 INFO [IM] Changing the state of adapters according to result: <class mscs::InterfaceResult>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO <up><vector len='0'>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </vector>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </up>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO <down><vector len='0'>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </vector>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </down>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO <unreachable><vector len='2'>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO <item>5ced600f-83b3-4fde-aa4e-f3d2e4f6e584</item>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO <item>5f354f39-8e3b-4ad1-bb13-96e2e1d75d55</item>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </vector>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </unreachable>
000027b8.000027bc::2015/08/25-11:33:54.827 INFO </class mscs::InterfaceResult>
000027b8.0000212c::2015/08/25-11:33:54.827 INFO [DCM] HandleInterfaceChange
000027b8.000013c4::2015/08/25-11:33:54.827 INFO [DCM] HandleRequest: dcm/connectivityCheck
000027b8.000013c4::2015/08/25-11:33:54.827 INFO [DCM] Skipping client access network d7dcde53-bf75-427e-8083-c2f4166be39c for multichannel
000027b8.00002574::2015/08/25-11:33:54.827 INFO [NM] Changing network state for network Heartbeat Network to 2
000027b8.00002574::2015/08/25-11:33:54.827 INFO [GEM] Node 1: Sending 1 messages as a batched GEM message
000027b8.00002574::2015/08/25-11:33:54.827 INFO [NM] Received state change update for network Heartbeat Network (id 6232c658-0077-4a50-978e-6415c4de564e) to 2
000027b8.00001428::2015/08/25-11:33:54.905 INFO [CHM] Sending route weight vector for nodes (1 2) to nodes (2)
000027b8.00002448::2015/08/25-11:33:55.140 INFO [IM] Leader got report from 2
000027b8.00002448::2015/08/25-11:33:55.140 INFO [IM - Heartbeat Network] 1 reports in state calculator queue
000027b8.00002410::2015/08/25-11:33:55.140 INFO [IM - Heartbeat Network] State calculator got new report from 5f354f39-8e3b-4ad1-bb13-96e2e1d75d55
000027b8.00002410::2015/08/25-11:33:55.140 INFO [IM - Heartbeat Network] 0 reports in state calculator queue
000027b8.000013c4::2015/08/25-11:33:55.140 INFO [DCM] HandleRequest: dcm/queryServerEndpoints
000027b8.000013c4::2015/08/25-11:33:55.140 WARN [DCM] RDR handle for target node id 2 is not yet populated
000027b8.000013c4::2015/08/25-11:33:55.140 WARN [DCM] RDR handle to node 2 is not available. Skipping server refresh
000027b8.00001428::2015/08/25-11:33:58.905 INFO [CHM] Sending route weight vector for nodes (1 2) to nodes (2)
000027b8.00001428::2015/08/25-11:34:01.140 INFO [IM - Heartbeat Network] Timer fired. Sending off request to get reports about missing interfaces <vector len='2'>
000027b8.00001428::2015/08/25-11:34:01.140 INFO <item></item>

Any help will be greatly appreciated. I also have another cluster using the same heartbeat network and I do not get these errors in the other cluster.

↧

replacing storage on failover cluster

September 30, 2015, 8:35 am

≫ Next: Monitoring Server (Opmanager) shows clear/online status for one of the MS SQL Server 2012 on Windows 2012 R2 virtual machines

≪ Previous: Microsoft Failover Clustering Events 1126, 1127 and 1129. Hearbeat Network Deprecated

Dear all,

I have a 2-Nodes Fail-over Cluster running on top of a HP Store Easy 1430 8 TB SATA Storage that needs to be re-installed. I have 4 VMs running there, how do I do?

What will happen with the two nodes when I format the storage and re-install, they will continue being able to see the CVS and boot the VMs?

I mean, I want an idea on how to do this please.

Regards,

Nelson Chamba\

nelson chamba

↧

Monitoring Server (Opmanager) shows clear/online status for one of the MS SQL Server 2012 on Windows 2012 R2 virtual machines

February 11, 2016, 3:14 pm

≫ Next: cluster shared volume has entered a paused state because of '(c0130021)'

≪ Previous: replacing storage on failover cluster

Environment:-

Opmanger monitoring server across multiple WAN connections installed on subnet 10.250.1.xx
3 x MS SQL Server 2012 Enterprise edition installed on two MS Windows 2012 R2 virtual machines on clustered environment
1 x MS SQL Server is on 10.15.16.x subnet
2 x MS SQL Server is on 10.15.18.xx subnet

No Issue:-

No issue from monitoring application to SQL Server on 10.15.16.xx subnet the status shows "Online"
No issue from monitoring application to any other server on subnet 10.15.18.xx submnet

Issue:-

One on the SQL server on 10.15.18.xx subnet
Monitoring application shows "Online" status for one of SQL server. If I restart SQL server with "critical" status updates to "online" after restart and other SQL server with "online" status changes to "critical"
Basically one of the SQL Server on cluster environment subnet 10.15.18.xx is always showing on "Critical" status by monitoring application server
I can ping from server status "online" both direction
I cannot ping from server status "critical" both direction
I can trace from server status "online" both direction
I cannot trace from server status "critical" both direction
No errors into event logs

I have done my troubleshooting and also posted on opmanger forums with no luck on resolution

I believe it is more cluster issue when service restarts

Any idea on resolution please?

Muhammad Mehdi

↧

cluster shared volume has entered a paused state because of '(c0130021)'

February 16, 2016, 8:18 am

≫ Next: File Share Clustering Options without shared storage.

≪ Previous: Monitoring Server (Opmanager) shows clear/online status for one of the MS SQL Server 2012 on Windows 2012 R2 virtual machines

Occasionally a cluster node Hyper-V 2012 R2 loses one of storage and all Vm contained in it stop working.
On the physical node are recorded these errors:
cluster shared volume has entered a paused state because of '(c0130021)'

we could not figure out what 's the event that triggers the problem.

Can you Help us!

Many thanks

Andrea

↧

File Share Clustering Options without shared storage.

February 16, 2016, 5:51 pm

≫ Next: Error trying to validate cluster

≪ Previous: cluster shared volume has entered a paused state because of '(c0130021)'

Trying to determine what are my options for creating an HA file share if I have 2 servers, but they don't have access to shared storage.

After doing some research it seems like my only available options are DFS. Can anyone confirm?

↧

Error trying to validate cluster

February 16, 2016, 5:06 am

≫ Next: 2012 R2 Cluster - Active Node ejects all other nodes - random times

≪ Previous: File Share Clustering Options without shared storage.

Hi,

We have two separate production clusters; one of 3 nodes, one with 5. All nodes are running Server 2008 R2 Datacenter SP1. When I try to start the 'Validate This Cluster..." wizard on either I immediately get a message "The actions 'Validate This Cluster...' did not complete. There is an error in XML document (5, 73).

When I try to run the Test-Cluster cmdlet I get the same error in the Powershell console.

I have search around and can see a couple of forum posts around this error, but no solutions. Can anyone assist?

↧

2012 R2 Cluster - Active Node ejects all other nodes - random times

February 3, 2016, 1:09 pm

≫ Next: Windows cluster 2012 failover Was occured due to RHS.exe in memory dump.

≪ Previous: Error trying to validate cluster

ISSUE

We have a 4 node 2012 R2 Cluster - Active\Passive \File Share\ and Passive DR Server

Our issue is that our active node appears to be losing all cluster communication and is ejecting all other nodes and we can not find any system event log items to indicate loss of local area connection or issues with network dropping. We have a third party monitoring tool that during these events has never lost a ping to this system showing it as down.

Our current Band-Aid fix is to set the Cluster Service to restart automatically after failure. This gets the cluster back online after 60 seconds but we are still down for 60 seconds. We have not enabled automatic failover due to fact that all applications have not been tested on node 2 of production as of yet.

Here are the variables for our environment.

Cluster is physical on Dell Hardware. Current network team shows no errors within Open Manage SA.

Network team shows no indication of flapping on the switch.

Systems:

Active - SQL-CL02 - 1 Vote (Active Cluster Owner)

Passive- SQL-CL03 - 1 Vote

File share - WIN2012-FS01 - 1 Vote

PassiveDR- SQL-CL01 - 0 Vote

Cluster Networking Info:

Production - Network in use for cluster communications.

10.100.1.7/26

Backup Network - Disabled for cluster communications.

DR - Network in use for cluster communications.

10.200.1.7/26

Failure Events in order of time from cluster event logs.

1135 - Cluster node 'SQL-CL03' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

*** (No network connections identified, we have a 3rd party monitoring tool that showed active pings thorough out this event.)

1135 - Cluster node 'SQL-CL01' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

1564 - File share witness resource 'File Share Witness' failed to arbitrate for the file share '\\WIN2012-FS01\Witness'. Please ensure that file share '\\WIN2012-FS01\Witness' exists and is accessible by the cluster.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

1069 - Cluster resource 'File Share Witness' of type 'File Share Witness' in clustered role 'Cluster Group' failed.

1177 - The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.

Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

1561 - The cluster service has determined that this node does not have the latest copy of cluster configuration data. Therefore, the cluster service has prevented itself from starting on this node.

Try starting the cluster service on all nodes in the cluster. If the cluster service can be started on other nodes with the latest copy of the cluster configuration data, this node will be able to subsequently join the started cluster successfully.

If there are no nodes available with the latest copy of the cluster configuration data, please consult the documentation for 'Force Cluster Start' in the failover cluster manager snapin, or the 'forcequorum' startup option. Note that this action of forcing quorum should be considered a last resort, since some cluster configuration changes may well be lost.

1069 - Cluster resource 'WIN2012-SQLAG-01_10.100.1.7' of type 'IP Address' in clustered role 'WIN2012-SQLAG-01' failed.

Thanks for your consideration on this issue.Where else might we search for more information on this issue.

-D

↧

Windows cluster 2012 failover Was occured due to RHS.exe in memory dump.

February 18, 2016, 3:42 am

≫ Next: Intermittent Live Migration failure generating Event ID 21502, 22038, 21111, 21024

≪ Previous: 2012 R2 Cluster - Active Node ejects all other nodes - random times

Need to know which process had caused the cluster in hang mode.

↧

Intermittent Live Migration failure generating Event ID 21502, 22038, 21111, 21024

February 18, 2016, 1:21 pm

≫ Next: Access denied when validating configuration for a failover cluster

≪ Previous: Windows cluster 2012 failover Was occured due to RHS.exe in memory dump.

We have a multi node Hyper V Cluster that has recently developed an issue with intermittent failure of live migrations.

We noticed this when one of our CAU runs failed because it could not place the Hosts into maintenance mode or successfully drain all the roles from them.

Scenario:

Place any node into Maintenance mode/drain roles.

Most VM's will drain and live migrate across onto other nodes. Randomly one or a few will refuse to move (it always varies in regards to the VM and which node it is moving to or from). The live migration ends with a failure generating event ID's 21502, 22038, 21111, 21024. If you run the process again (drain roles) it will migrate the VM's or if you manually live migrate them they will move just fine. Manually live migrating a VM can result in the same intermittent error but rerunning the process will succeed after one or two times or just waiting for a couple minutes.

This occurs on all Nodes in the cluster and can occur with seemingly any VM in the private cloud.

Pertinent content of the event ID's is:

Event 21502
Live migration of 'VM' failed.

Virtual machine migration operation for 'VM' failed at migration source 'NodeName'. (Virtual machine ID xxx)

Failed to send data for a Virtual Machine migration: The process cannot access the file because it is being used by another process. (0x80070020).

Event 22038
Failed to send data for a Virtual Machine migration: The process cannot access the file because it is being used by another process. (0x80070020).

According to this it would appear that something is locking the files or they are not transferring permissions properly, however all access to the back end SOFS is uniform across all the Nodes and the failure is intermittent rather than consistently happening on one Node.

Thanks in advance!

↧

Access denied when validating configuration for a failover cluster

February 11, 2016, 3:06 am

≫ Next: Cluster-aware updating - Self updating not working

≪ Previous: Intermittent Live Migration failure generating Event ID 21502, 22038, 21111, 21024

Hi,

i've spent days now trying to install a cluster on two virtual Server 2012 R2 nodes running on ESX 6. No matter what i try it always comes back to the following error in the validation report:

An error occurred while executing the test.
An error occurred while getting information about the software updates installed on the nodes.

One or more errors occurred.

Creating an instance of the COM component with CLSID {4142DD5D-3472-4370-8641-DE7856431FB0} from the IClassFactory failed due to the following error: 80070005 Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)).

I've checked all the things mentioned in https://social.technet.microsoft.com/Forums/windowsserver/en-US/39e6e957-95fd-4de5-89c2-0ea60e63b9d6/access-is-denied-messages-in-win2012-r2-failover-cluster-validation-report-and-csv-entering-a-paused?forum=winserverClustering and several other things. No change.

My last finding related to this problem is, that everytime this access denied error happens, two entries are logged in the security event log of one of our domain controllers:

Note: the blacked service name shows my username.

AAccording to RFC4120 error 0x1b (27) means

 KDC_ERR_MUST_USE_USER2USER            27  Server principal valid for
                                               user2user only

I'm logged on with a domain admin with local admin rights on the cluster nodes and i have no idea what might be the reason for this problem. Can anybody shed some light on this, please?

Thanks,

Klaus

↧

Cluster-aware updating - Self updating not working

April 11, 2013, 12:27 am

≫ Next: Monitoring Server (Opmanager) shows clear/online status for one of the MS SQL Server 2012 on Windows 2012 R2 virtual machines

≪ Previous: Access denied when validating configuration for a failover cluster

Hi,

I have a Windows Server 2012 failover cluster with 2 nodes, and I am having problems gettign the Self updating to work properly.

The Analyze CAU Readiness does not report any issues, and I have been able to run a remote update with no problems. I don't get any errors or failure messages in the CAU client, only this message: "WARNING: The Updating Run has been triggered, but it has not yet started and might take a long time or might fail. You
can use Get-CauRun to monitor an Updating Run in progress."

In the Event Viewer is see 2 errors and 1 warning for each run, Events 1015, 1007 and 1022.

1015: Failed to acquire lock on node "node2". This could be due to a different instance of orchestrator that owns the lock on this node.

1007: Error Message:There was a failure in a Common Information Model (CIM) operation, that is, an operation performed by software that Cluster-Aware Updating depends on.

Does anyone have any idea what is causing this to fail?

Thanks!

↧