Building a highly available on-premises VPN gateway

Overview

Hybrid networking enables enterprises to connect the on-premises network to a cloud service provider like Azure. As enterprises increasingly migrate workloads to the cloud they demand that the to-the-cloud connectivity be highly available. In this blog, we’ll describe how enterprises using Windows Server technologies can build the on-premises site-to-site VPN gateway on a two-node failover cluster and connect to Azure. Failover Clustering ensures that when software bugs or hardware problems cause one cluster node to fail the service running on the node, which is site-to-site VPN in this case, is quickly moved to the other node, thereby minimizing the service downtime.

Setup Topology

The following diagram illustrates the logical topology of the two-node cluster we will set up.

Topology

The two nodes are connected to three networks. External network is connected to the public Internet. Internal network is enterprises’ on-premises network. Management and cluster services run on the third network.

Prerequisites

The site-to-site VPN gateway will be configured to run in two virtual machines, each of which is hosted on a Hyper-V server. Both the virtual machines and the Hyper-V servers must run Windows Server 2012 R2.

The Hyper-V servers are connected to all three networks. Hyper-V switches are preconfigured. The VHDs for the virtual machines are ready for use.

Failover Clustering requires that all cluster nodes, which are the two virtual machines here, be domain-joined. Therefore, an Active Directory server must be preconfigured and run on the management / cluster network.

A two-node cluster is better served with a third “witness”. In case the two nodes fail to communicate with each other, the witness can help nominate the active node. In this exercise, we’ll use a file share as the witness. The file share is on a preconfigured file server running on the management / cluster network.

DHCP is not configured to run on any of the three networks. The management IPs for the two virtual machines are statically configured on the management network interfaces. The internal / external network interfaces will get APIPA initially. The “real” internal IP address and the external IP address will be plumbed down as cluster resources as described below.

Configuration

We break down the steps into the two sections:

  • Hyper-V Host Configuration
  • Virtual Machine Configuration

Hyper-V Host Configuration

The following PowerShell configuration should be applied to the first Hyper-V host, which hosts the first virtual machine. The same configuration, except for the name of the second virtual machine, should be applied to the second Hyper-V host.

#External and Internal switches are preconfigured

$externalSwitch = “Internet”

$internalSwitch = “OnPrem”

 

#Remember to rename the second virtual machine

$vm = “VPN_GW_1″

 

#Fill in the VHD path

new-vm $vm -SwitchName $externalSwitch -VHDPath <VHD Path> -MemoryStartupBytes 2GB -Generation 2

 

#This VM NIC is connected to the external switch. So it’s facing the public Internet

Rename-VMNetworkAdapter -VMName $vm -NewName InternetNIC

 

#Add a VM NIC to connect to the internal network

Add-VMNetworkAdapter -VMName $vm -VMNetworkAdapterName InternalNIC -SwitchName $internalSwitch

 

#Add a VM NIC to connect to the management / cluster network

Add-VMNetworkAdapter -VMName $vm -VMNetworkAdapterName ManagementNIC -SwitchName $internalSwitch

 

#For this exercise, make the management / cluster network on VLAN 2

$managementVLAN = 2

Set-VMNetworkAdapterVlan -VMName $vm -VMNetworkAdapterName ManagementNIC -Access -VlanId $managementVLAN

 

#For this exercise, make the internal network on VLAN 11

$internalVLAN = 11

 

#Enable multitenancy on internal NICs. See explanation in the blog

Set-VmNetworkAdapterIsolation -VMName $vm -VMNetworkAdapterName InternalNIC -MultiTenantStack On -IsolationMode Vlan -AllowUntaggedTraffic $true

Add-VmNetworkAdapterRoutingDomainMapping -VMName $vm -VMNetworkAdapterName InternalNIC -RoutingDomainID “{10000000-1000-1000-1000-000000000001}” -RoutingDomainName “Onprem” -IsolationID $internalVLAN -IsolationName “OnPremVLAN”

 

Start-vm$vm

VLAN is not configured for the external NIC, “InternetNIC”. Traffic is untagged between the external NIC and the physical network, to which the Hyper-V host is connected.

Set-VmNetworkAdapterIsolation and Add-VmNetworkAdapterRoutingDomainMapping are the key configurations. Failover Clustering for site-to-site VPN is supported only when the gateway is in the multitenant mode. These two cmdlets set up the virtual machine to be multitenant. Specifically, they added a separate routing domain, “Onprem”, for the internal network. VLAN 11 is the identifier for traffic in this routing domain. Note down the routing domain ID and name. We’ll explain how they are used later.

The host configuration is done. It should be noted that there is no need to create a failover cluster for the two Hyper-V hosts.

Virtual Machine Configuration

Both virtual machines should be running at this time. The following configuration must be done on both machines before we can proceed further.

  • Join a domain hosted on the Active Directory
  • Add a domain user as Administrator
  • Rename the network adapter that is connected to the external network to “Internet”. It’s optional to rename the internal network adapter and the management network adapter.
  • Install Failover Clustering. A quick way is to run the following two PowerShell cmdlets.

Get-WindowsFeature *cluster* | Install-WindowsFeature

Get-WindowsFeature *file* | Install-WindowsFeature

Once Failover Clustering is installed, run the following configuration on either one of the two virtual machines, but not both.

#The two virtual machines have been renamed to “VPN_GW_1″ and “VPN_GW_2″ respectively

$clustername = “OnPremGW”

$clusternodes = @(“VPN_GW_1″,“VPN_GW_2″)

new-cluster -Name $clustername -Node $clusternodes -NoStorage

 

#”Cluster” is a file folder preconfigured on the file server

Set-ClusterQuorum –FileShareWitness “\\FS\Cluster”

 

# Add an internal IP address for the internal network

$res = Add-ClusterResource -ResourceType “Disjoint IPv4 Address” -Name “InternalAddress” -Group “Cluster Group”

$res | Set-ClusterParameter -Multiple @{“PrefixLength”=“24″;“Address”=“192.168.200.1″;“VSID”=“11″;“RDID”=“{10000000-1000-1000-1000-000000000001}”}

Start-ClusterResource$res

 

# Add an external IP facing the Internet

$res = Add-ClusterResource -ResourceType “Disjoint IPv4 Address” -Name “ExternalAddress” -Group “Cluster Group”

$res | Set-ClusterParameter -Multiple @{“PrefixLength”=“24″;“Address”=“131.xx.xx.xx;“AdapterName”=“Internet”}

Start-ClusterResource$res

As mentioned earlier, File Share Witness is configured for this cluster. Specifically, \\FS\Cluster is just a file folder on the file server. Make sure both virtual machines have read/write access to the file folder.

“Disjoint IPv4 Address” is a new cluster resource type added in Windows Server 2012 R2. It can only be configured by PowerShell, not by the Failover Cluster Manager, the GUI tool on Windows Server. We added two IP addresses of this resource type, one for the internal network and one for the external network.

  • The internal address is plumbed down for the cluster network that is identified by Routing Domain ID and VLAN number. Remember, we mapped them to the internal network adapters on the Hyper-V hosts earlier. It should be noted that this address is the default gateway address for all machines on the internal network that need to connect to Azure.
  • The external address is plumbed down for the cluster network that is identified by the network adapter name. Remember, we renamed the external network adapter to “Internet” on both virtual machines.

 

Like any other cluster resource, both addresses are owned by the active node in the cluster. Whenever the active node fails both addresses will be plumbed down to the new active node.

The next step is to install the default route on both virtual machines.

New-NetRoute -InterfaceAlias Internet -DestinationPrefix 0.0.0.0/0 -NextHop 131.xx.xx.xx

It’s critical that there’s only one default route on each virtual machine. If DHCP runs on a network a default route usually is installed to point to the DHCP server. That route must be deleted.

The next step is to install the site-to-site VPN feature on both virtual machines.

Add-WindowsFeature -Name RemoteAccess -IncludeAllSubFeature -IncludeManagementTools

Install-RemoteAccess-Multitenancy

Enable-RemoteAccessRoutingDomain -Name Onprem -Type VpnS2S

“Onprem” is the name of the routing domain we added earlier when configuring the Hyper-V host. The Hyper-V host feeds the configuration into the virtual machine, which creates a compartment for each routing domain. Get-NetCompartment should show the following output when run on both virtual machines.

get-netcompartment

The routing domain ID and name are essentially the compartment GUID and description respectively.

Finally, to complete the configuration, run the following PowerShell cmdlets on the virtual machine that is the active node in the cluster.

#Add interface to connect to Azure. Fill in the shared key

Add-VpnS2SInterface -Protocol IKEv2 -AuthenticationMethod PSKOnly -NumberOfTries 3 -ResponderAuthenticationMethod PSKOnly -Name 168.xx.xx.xx -Destination 168.xx.xx.xx -IPv4Subnet @(“10.0.0.0/8:100″) -SharedSecret <shared key> -RoutingDomain Onprem

 

#Add a cluster resource to keep the VPN configuration in sync

Add-ClusterResourceType -Name “RAS Cluster Resource” -Dll RasClusterRes.dll

$res = Add-ClusterResource -ResourceType “RAS Cluster Resource” -Name “S2SVPN” -Group “Cluster Group”

$res | Set-ClusterParameter -Name “ConfigExportPath” -Value “\\FS\Cluster”

Start-ClusterResource$res

Because the site-to-site VPN gateway is multitenant, a routing domain must be specified for where the VPN connection is terminated. Obviously for this exercise, it’s “Onprem”.

“RAS Cluster Resource” is another new cluster resource type added in Windows Server 2012 R2. This resource object specifies where the site-to-site VPN configuration is stored. The file share can be anywhere the two virtual machines have read / write access to. To keep it simple, we just chose the same file folder that was configured earlier as the File Share Witness for the cluster. When an active node fails, the standby node will read the configuration from this folder and reconnect to Azure.

Verification

Assume the virtual network gateway on Azure is already configured. The site-to-site VPN connection should be up and running. Get-VpnS2SInterface should provide the following output.

clip_image002

To verify that the connectivity between the on-premises network and Azure resumes after the active cluster node fails – remember site-to-site VPN always runs on the active node – we simply turn off the virtual machine from Hyper-V Manager, the GUI tool on Windows Server. We ping both the internal address, 192.168.200.1, and a VM on Azure, 10.0.20.4, continuously from a local machine on the internal network. As can be seen below, after loss of 4 ping packets, the connectivity to the new active node resumes. And shortly afterwards – one additional packet loss – the connectivity to Azure resumes too.

Ping from the local machine to the on-premises site-to-site VPN gateway:

clip_image003

Ping from the local machine to the Azure VM over the site-to-site VPN connection:

clip_image004