In our SQL Failover Cluster Instance (FCI) environments (overall environment description available here) we generally use 2 servers plus a disk witness (quorum drive) – we only have one data center so we don’t take the overhead of using node majority.  That quorum drive is part of the resource group called “Cluster Group”, commonly known as cluster core resources.  The core cluster resources have no relationship with any other cluster resources (e.g. the SQL Server resource), and they can fail over independently from any other cluster resource. They also stay in place if someone fails over the other resources (i.e. SQL).

With one resource group (SQL) in an active/passive configuration you need to keep quorum drive on the same node as that resource so in case inter-node connectivity is lost the cluster won’t fail (and go offline) or the SQL resource failover.  We’ve seen this behavior with vMotions, where there will be a minor loss of connectivity between nodes but enough to make the cluster think it’s unhealthy, even though applications can still connect with no problem.  The voting members are each of the nodes (2 votes) and the quorum drive (1 vote). If SQL is on the node with only 1 vote, then the 2-votes node will take over the resource in case of loss of connectivity. If the 2-vote node crashes, the cluster will simply fail; you may even need to do a force-quorum start which can be bad ju-ju.

To make sure we don’t encounter this sort of issue, we need to make sure the core cluster resources (which includes quorum drive) are on the same node as SQL Server so failover doesn’t happen in the case of a node losing connectivity to the other node.

If the group is already on the current node it silently succeeds; otherwise it takes a couple of seconds and succeeds.  This could be engineered to be on the Windows machines, as a scheduled task, but then you’d need to include a check for which node was hosting the SQL resource group and add logic to work with that.  Instead I created a SQL Agent job on the local SQL FCI which would periodically run a PowerShell script command to perform the failover of the core cluster group.  While we have a mix of Windows 2008R2 and Windows 2012, with Windows 2008R2 the SQLPS module couldn’t import the failover cluster module, so I needed to go a route where we build the script dynamically every time through command shell and then execute it via a call to powershell.exe (but it still doesn’t work on Windows 2008R2 without upgrading PowerShell).  Otherwise, this simple Powershell script tells the core cluster resource group to move to the current node:

powershell.exe "Move-ClusterGroup -Name 'Cluster Group' -Node (Get-ClusterResource | where ownergroup -eq 'SQL Server (MSSQLSERVER)' | where name -eq 'SQL Server' | select ownernode).OwnerNode"

We just have it set to run every hour and we have more of a safety net.