• 8 min read

High Availability for a file share using WSFC, ILB and 3rd-party Software SIOS Datakeeper

While Windows “Shared Disk Failover Cluster” is not yet supported for Microsoft Azure Virtual Machines, 3rd-party software SIOS Datakeeper can be used as an alternative. As a sample use case, this post describes how to make a file share highly available.

While Windows “Shared Disk Failover Cluster” is not yet supported for Microsoft Azure Virtual
Machines, 3rd-party software SIOS Datakeeper can be used as an alternative: https://us.sios.com/products/datakeeper-cluster/ .

As a sample use case, this post describes how to make a file share highly available. All
information which is needed to set it up is basically available on the Internet. The idea of this
post is simply to put everything together. Before going into the details I would like to start with
a high-level overview of the approach :

 

overview_01b

Figure 1

The key components of the tested solution setup are :

  • A domain controller VM and two VMs which represent the failover cluster
  • HA for the domain controller is NOT part of this post! The focus is purely on the high
    availability implementation for the file share ( not the whole cluster end-to-end )
  • For simplicity, a simple file share witness on the domain controller VM was used as the
    cluster quorum configuration. As mentioned above the DC VM has no HA which means
    that the file share witness is gone once the DC VM is stopped
  • All three VMs are part of one Azure Vnet. The domain controller VM and the two cluster
    node VMs belong to two different cloud services to get a clean separation of the Internal
    Load Balancer configuration which is done on a cloud service level
  • To ensure high availability for a file share, it’s necessary to put both cluster nodes into an
    Azure availability set to avoid that both VMs might go down at the same time
    ( e.g. during Azure maintenance )
  • WSFC was used to configure a shared disk failover cluster between the two cluster node VMs
    and to finally provide the highly available file share
  • The Azure Internal Load Balancer ( ILB ) allows access to the file share via the virtual name of
    the file server role created on the failover cluster
  • The “trick” then is to replace the IP address of the file server role by the IP address of the ILB

 

Now the question is of course – how should all this work as there is no shared disk available for
Azure Virtual machines like it is for Hyper-V on-premises ? The answer is :  SIOS Datakeeper

 

overview_02

Figure 2

Using SIOS Datakeeper one can create a so-called “mirror”
between two volumes ( data disks attached to the VMs )
via synchronous replication:

  • When creating the mirror between the two volumes ( Azure data disks ) SIOS Datakeeper will
    add this mirror as storage to the cluster configuration
  • To the cluster failover manager it then looks like a shared disk
  • As mentioned before – ILB allows to access the file share via the file server role name and will
    always route to the active cluster node

Prerequisites:

  • All tests were done with Windows 2012 R2
  • SIOS Datakeeper version was 8.2
  • SIOS Datakeeper installation requires .NET Framework 3.5. I ran into a problem which at the
    time of my testing required a fix in the form of running a little exe downloaded from the
    following KB article : https://support2.microsoft.com/kb/3005628. Recent tests showed that this is no longer an issue in the latest WS2012 R2 Azure Gallery image from October
  • In addition, a VM shouldn’t be in a vnet when installing .NET framework 3.5. There are some
    issues you find in articles on the Internet. It’s related to locating the source files and DNS
    settings. As a workaround, one could add 8.8.8.8 as the DNS entry inside the VM. There might be other cases too where certain bug fixes might be necessary like these two examples :
    https://blogs.technet.com/b/askcore/archive/2013/01/14/error-in-failover-cluster-manager-after-install-of-kb2750149.aspx
    https://support.microsoft.com/kb/2804526
  • For my internal testing, I finally created my own OS WS2012 R2 image where I installed .NET framework 3.5 as well as the fix from the third item above. The other two bug fix examples were not necessary or applicable. To create the private OS image I followed the guidance
    below. Then I created all my test VMs from this private image: https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-capture-image-windows-server/ . In case no bug fixes are required it’s of course perfectly fine to just use a standard Azure
    Gallery OS image
  • I used a dedicated vnet for my test environment. You have to watch out to create the vnet the right way to make ILB work. In the past vnets were associated with an affinity group. This is no longer the case :
    To set up ILB the new regional type is required. For newly created vnets everything is fine.
  • One has to enter the location ( Azure region ) when creating the vnet via the portal. For old existing vnets there is an option to change the vnet setting from affinity group to regional by modifying the network config. I tested this myself and it’s not possible though while there are VMs in the vnet. For my test, I removed the VMs ( keeping all disks ), exported the VM configs and imported them again. Here is an article about this config change : https://azure.microsoft.com/blog/2014/05/14/regional-virtual-networks/
  • Then I did the setup of a domain controller VM and the two cluster node VMs in two different cloud services. This is a clean setup avoiding any kind of potential side effects related to the ILB configuration as  the ILB will be configured per cloud service.
  • SIOS also provides some step-by-step guides which describe the process. Here is one using WS2012 which also helps with WS 2012 R2 :
    https://clusteringformeremortals.com/2012/12/31/windows-server-2012-clustering-step-by-step/

 

What it looks like:

 

a_details_00

Figure 3

As described in the overview section, a domain controller VM was created. The DNS and AD entries show:

  • Cluster node 1 VM with name “fsha-cln1”  ( “fsha” stands for file share high availability )
  • Cluster node 2 VM with name “fsha-cln2”
  • Domain controller VM with name “fsha-dc”
  • Domain “fshadomain.com”
  • Cluster name “fshacl”
  • File server role in the cluster with name “fshafsrole”

 

 

a_details_01

Figure 4

  • A “mirror” was created via a SIOS Datakeeper “job” and named “fshajob”
  • The mirror defines synchronous replication of volume S: ( 5GB data disk attached to the Azure VM ) between the two cluster node VMs
  • The Datakeeper screen shows that the primary ( source ) is currently cluster node 1 and the secondary ( target ) is cluster node 2

 

 

a_details_02

Figure 5

  • Looking at the file system of cluster node 1, you will see a file share on volume S:
  • This is the share which should become highly available

 

 

a_details_03

Figure 6

  • The Datakeeper volume is visible in the failover cluster manager and allows the creation of a file server role
  • Under “Shares” within the file server role, one can see the file share which we looked at on file explorer level in figure 5

 

a_details_04

Figure 7

  • Checking the second cluster node, it turns out that the replicated volume S: can be seen e.g. on file explorer level but access is not possible
  • SIOS Datakeeper makes sure that the replicated volume can only be accessed on the current owner node

 

 

a_details_05

Figure 8

  • The file share created in the cluster role can be accessed from the domain controller as expected by the virtual name :  “fshafsrolefsha_share”

 

 

a_details_06

Figure 9

  • Now we start a manual failover of the file server role from cluster node 1 to cluster node 2

 

a_details_07

Figure 10

  • The failover cluster manager on the second node ( Azure VM fsha-cln2 ) shows that the owner node changed to fsha-cln2
  • SIOS Datakeeper also switched source and target server
  • Now, it’s possible to access the file share on file explorer level on the second node which didn’t work before
  • Access from the domain controller VM also still works as expected

 

a_details_08

Figure 11

  • Right-click on the file server role within the failover cluster manager allows to set appropriate permissions for the file share access

 

a_details_09

Figure 12

  • As mentioned in the overview section, it’s not enough to use the Datakeeper mirror like a shared disk for the failover cluster
  • Another “workaround” is necessary regarding the access to the file share via the name of the file server role
  • It was achieved with the help of the Azure Internal Load Balancer – ILB
  • The screenshot shows that an Internal Load Balancer with the name “ilbfsha” was created on the cloud service of the cluster nodes
  • The IP address of the Load Balancer is 10.0.0.99

 

 

a_details_10

Figure 13

  • Checking the “Resources” tab in failover manager shows that the file server role has in fact the IP address of the ILB – 10.0.0.99
  • This is the “trick” to make the whole setup finally work. The file server role was originally created with a different IP address
  • Once the file server role was created, its IP address was replaced by the IP address of the ILB
    via a Powershell command. See the PS command example in the ILB Configuration section
    further down.
    It’s not only a simple IP address change which could be done in the Failover Manager GUI

 

a_details_11

Figure 14

  • The PS command “get-azureendpoint” shows that two endpoints were created for cluster node 1 ( same for cluster node 2 ) related to ILB
  • The local ports are 443 and 445 and the so-called “probe port” is 59999

 

ILB Configuration:

Like the setup of a domain controller VM or the failover cluster on Azure, the information about how to configure the Azure Internal Load Balancer ILB can be found on the Internet. What is needed for the highly available file share regarding ILB is in fact the same which is required for SQL Server AlwaysOn – see links below.

Adding the ILB to an Azure cloud service as well as adding the endpoints 443 and 445 to the VMs via Azure Powershell is trivial and exactly the same as for SQL Server AlwaysOn. The only part which needs a bit attention, is the setting of the file server role IP address. In the lab environment, the file server role was originally created with IP address 10.0.0.100. At the end, it had to be replaced by the ILB IP address 10.0.0.99. This can be accomplished by the same set-clusterparameter command as one can find in the blog below about the SQL AlwaysOn group listener. Below is a sample how it was done in the file share lab setup.

Commands run via Azure Powershell locally on-prem :

$probeport="59999"
add-azureInternalLoadBalancer -internalloadbalancername ilbfsha -ServiceName fsha-cscl -Subnetname Subnet-1 -StaticVnetIPaddress 10.0.0.99
Get-AzureVM -ServiceName fsha-cscl -Name fsha-cln1 | Add-AzureEndpoint -Name "fsep1" -LBSetName "ilbsetfsha" -Protocol tcp -LocalPort 443 -PublicPort 443 -ProbePort $ProbePort -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName ilbfsha | Update-AzureVM
Get-AzureVM -ServiceName fsha-cscl -Name fsha-cln1 | Add-AzureEndpoint -Name "fsep2" -LBSetName "ilbsetfsha2" -Protocol tcp -LocalPort 445 -PublicPort 445 -ProbePort $ProbePort -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName ilbfsha | Update-AzureVM
Get-AzureVM -ServiceName fsha-cscl -Name fsha-cln2 | Add-AzureEndpoint -Name "fsep1" -LBSetName "ilbsetfsha" -Protocol tcp -LocalPort 443 -PublicPort 443 -ProbePort $ProbePort -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName ilbfsha | Update-AzureVM
Get-AzureVM -ServiceName fsha-cscl -Name fsha-cln2 | Add-AzureEndpoint -Name "fsep2" -LBSetName "ilbsetfsha2" -Protocol tcp -LocalPort 445 -PublicPort 445 -ProbePort $ProbePort -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName ilbfsha | Update-AzureVM

 

Commands run via Powershell inside the cluster node VM after setting up the ILB above :

$ProbePort="59999"
$fileserverresource = get-clusterresource | Where-Object { ($_.name -like "IP*10.0.0.100*" ) }
$fileserverresource | set-clusterparameter -Multiple @{"Address"="10.0.0.99";"ProbePort"=$ProbePort;"Subnetmask"="255.255.255.255";"Network"="Cluster Network 1";"OverrideAddressMatch"=1;"EnableDhcp"=0}

 

Some sample links regarding Azure ILB configuration :

https://azure.microsoft.com/blog/2014/05/20/internal-load-balancing/

https://azure.microsoft.com/blog/2014/10/01/sql-server-alwayson-and-ilb/

https://blogs.msdn.com/b/sqlalwayson/archive/2013/08/06/availability-group-listener-in-windows-azure-now-supported-and-scripts-for-cloud-only-configuration.aspx

Miscellaneous

 

To make my life easier regarding the setup of the test environment I created an Azure PS script which helped me to create a domain controller VM and also the two cluster node VMs including domain join and so on. It’s NOT an official Microsoft utility. The emphasis was neither on
programming style nor on security. It was just about functional testing of SIOS Datakeeper. The idea was to automate all the steps to the point where one has to install SIOS Datakeeper as much as possible. There were a few challenges to overcome and I decided to share the findings
and the PS code.

The whole SIOS Datakeeper project is related to SAP HA on Azure Virtual Machines. Therefore I will publish this Azure PS sample in about two weeks on our Microsoft SAP Engineering Team blog :   Running SAP Applications on the Microsoft Platform