Set up a hybrid high performance computing (HPC) cluster with Microsoft HPC Pack and on-demand Azure PaaS Compute nodes

Use Microsoft HPC Pack 2016 Update 1 (or later version) and Azure to set up a small, hybrid high performance computing (HPC) cluster. The cluster shown in this article consists of an on-premises HPC Pack head node and some compute nodes you deploy on-demand in an Azure cloud service. You can then run compute jobs on the hybrid cluster.

Hybrid HPC cluster

This tutorial shows one approach, sometimes called cluster "burst to the cloud," to use scalable, on-demand Azure resources to run compute-intensive applications.

This tutorial assumes no prior experience with compute clusters or HPC Pack. It is intended only to help you deploy a hybrid compute cluster quickly for demonstration purposes. For considerations and steps to deploy a hybrid HPC Pack cluster at greater scale in a production environment see the detailed guidance. If you want use previous version of HPC Pack, see HPC Pack 2012 R2 doc

Prerequisites

  • Azure subscription - If you don't have an Azure subscription, you can create a free account in just a couple of minutes.

  • An on-premises computer running Windows Server 2012 R2 or Windows Server 2016 - Use this computer as the head node of the HPC cluster. If you aren't already running Windows Server, you can download and install an evaluation version.

    • The computer should be joined to an Active Directory domain. For test purposes, you can configure the head node computer as a domain controller. To add the Active Directory Domain Services server role and promote the head node computer as a domain controller, see the documentation for Windows Server.
    • To support HPC Pack, the operating system must be installed in one of these languages: English, Japanese, or Chinese (Simplified).
    • Verify that important and critical updates are installed.
  • HPC Pack 2016 - Download the installation package for the latest version free of charge and copy the files to the head node computer.

  • Domain account - This account must be configured with local Administrator permissions on the head node to install HPC Pack.

  • TCP connectivity on port 443 from the head node to Azure (out-bound).

Install HPC Pack on the head node

You first install Microsoft HPC Pack on your on-premises computer running Windows Server. This computer becomes the head node of the cluster.

  1. Log on to the head node by using a domain account that has local Administrator permissions.

  2. Start the HPC Pack Installation Wizard by running Setup.exe from the HPC Pack installation files.

  3. On the HPC Pack 2016 Setup screen, click New installation or add new features to an existing installation.

    HPC Pack 2016 Setup

  4. On the Microsoft Software User Agreement page, click Next.

  5. On the Select Installation Type page, click Create a new HPC cluster by creating a head node, and then click Next.

  6. The wizard runs several pre-installation tests. Click Next on the Installation Rules page if all tests pass. Otherwise, review the information provided and make any necessary changes in your environment. Then run the tests again or if necessary start the Installation Wizard again.

  7. On the HPC DB Configuration page, make sure Head Node is selected for all HPC databases, and then click Next.

    DB Configuration

  8. Accept default selections on the remaining pages of the wizard. On the Install Required Components page, click Install.

    Install

  9. After the installation completes, un-check Start HPC Cluster Manager and then click Finish. (You start HPC Cluster Manager in a later step.)

    Finish

Prepare the Azure subscription

Perform the following steps in the Azure portal with your Azure subscription. After completing these steps, you can deploy Azure nodes from the on-premises head node.

Note

Also make a note of your Azure subscription ID, which you need later. Find the ID in Subscriptions in the portal.

Upload the default management certificate

Previous HPC Pack version installs a self-signed certificate on the head node, called the Default Microsoft HPC Azure Management certificate, that you can upload as an Azure management certificate. But from HPC Pack 2016 Update 1 this certificate is not provided by default thus you need prepare a self-signed cert with command on the head node and export the cert as tmpfolder\hpccert.cer:

New-SelfSignedCertificate -Subject "CN=HPC Pack Management" -KeySpec KeyExchange -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2") -CertStoreLocation cert:\LocalMachine\My -KeyExportPolicy Exportable -NotAfter (Get-Date).AddYears(5) -NotBefore (get-Date).AddDays(-1)
  1. From the head node computer, import the cert from tmpfolder\hpccert.cer to LocalMachine\My. Please be noted that un-like previous HPC Pack version, you don't need import the cert to LocalMachine\Trusted Root any more.

  2. Sign in to the Azure portal.

  3. Click Subscriptions > your_subscription_name.

  4. Click Management certificates > Upload, Browse on the head node for the file tmpfolder\hpccert.cer. Then, click Upload.

The Default HPC Azure Management certificate appears in the list of management certificates.

Create an Azure cloud service

Note

For best performance, create the cloud service and the storage account (in a later step) in the same geographic region.

  1. In the portal, click Cloud services (classic) > +Add.

  2. Type a DNS name for the service, choose a resource group and a location, and then click Create.

Create an Azure storage account

  1. In the portal, click Storage accounts (classic) > +Add.

  2. Type a name for the account, and select the Classic deployment model.

  3. Choose a resource group and a location, and leave other settings at default values. Then click Create. Here you have to New or Select Resource Group with prefix Default-Storage-<your_picked_Location> otherwise the service will not be able to locate the storage account.

Configure the head node

To use HPC Cluster Manager to deploy Azure nodes and to submit jobs, first perform some required cluster configuration steps.

  1. On the head node, start HPC Cluster Manager. If the Select Head Node dialog box appears, click Local Computer. The Deployment To-do List appears.

  2. Under Required deployment tasks, click Configure your network.

    Configure Network

  3. In the Network Configuration Wizard, select All nodes only on an enterprise network (Topology 5). This network configuration is the simplest for demonstration purposes.

    Topology 5

  4. Click Next to accept default values on the remaining pages of the wizard. Then, on the Review tab, click Configure to complete the network configuration.

  5. In the Deployment To-do List, click Provide installation credentials.

  6. In the Installation Credentials dialog box, type the credentials of the domain account that you used to install HPC Pack. Then click OK.

    Installation Credentials

  7. In the Deployment To-do List, click Configure the naming of new nodes.

  8. In the Specify Node Naming Series dialog box, accept the default naming series and click OK. Complete this step even though the Azure nodes you add in this tutorial are named automatically.

    Node Naming

  9. In the Deployment To-do List, click Create a node template. Later in the tutorial, you use the node template to add Azure nodes to the cluster.

  10. In the Create Node Template Wizard, do the following:

    a. On the Choose Node Template Type page, click Windows Azure node template, and then click Next.

    Node Template

    b. Click Next to accept the default template name.

    c. On the Provide Subscription Information page, enter your Azure subscription ID (available in your Azure account information). Then, in Management certificate, browse for Default Microsoft HPC Azure Management. Then click Next.

    Node Template

    d. On the Provide Service Information page, select the cloud service and the storage account that you created in a previous step. Then click Next.

    Node Template

    e. Click Next to accept default values on the Specify Proxy Nodes, Specify Worker Role and Specify Startup Script pages of the wizard. Please be noted that this script will be executed during the provisioning before the node being ready as offline. And there are two registry for you to tune the default behavior. For example if admin set registry key Microsoft.Hpc.Azure.AzureStartupTaskFailureEnable to 1, the deployment will wait until the startup script finishes execution before setting the node reachable for jobs: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC Name:Microsoft.Hpc.Azure.AzureStartupTaskFailureEnable Type:REG_DWORD; (default is 0) and HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC Name:Microsoft.Hpc.Azure.AzureStartupTaskTimeoutSec Type:REG_DWORD; (default is 1800)

    f. Click Next to accept default values on the Set Up Microsoft Azure Virtual Network Page. If you are using Azure VPN or express route with forced tunnel, you have to use Azure Internal Load Balancing, you shall just pick up one valid and free static IP from the subnet.

    Node Template

    g. Click Next to provide credentials on page Configure Remote Desktop Credential. Click Next to configure Availability Policy. If you want to enable auto grow shrink later, please just choose Start and stop nodes manually. Then, on the Review tab, click Create to create the node template.

    Note

    By default, the Azure node template includes settings for you to start (provision) and stop the nodes manually, using HPC Cluster Manager, which later you can configure Auto grow shrink. You can optionally configure a schedule to start and stop the Azure nodes automatically.

Add Azure nodes to the cluster

Now use the node template to add Azure nodes to the cluster. Adding the nodes to the cluster stores their configuration information so that you can start (provision) them at any time in the cloud service. Your subscription only gets charged for Azure nodes after the instances are running in the cloud service.

Follow these steps to add two Small nodes.

  1. In HPC Cluster Manager, click Node Management (called Resource Management in current versions of HPC Pack) > Add Node.

    Add Node

  2. In the Add Node Wizard, on the Select Deployment Method page, click Add Windows Azure nodes, and then click Next.

    Add Azure Node

  3. On the Specify New Nodes page, select the Azure node template you created previously (called by default Default AzureNode Template). Then specify 2 nodes of size Small, and then click Next.

    Specify Nodes

  4. On the Completing the Add Node Wizard page, click Finish.

    Two Azure nodes, named AzureCN-0001 and AzureCN-0002, now appear in HPC Cluster Manager. Both are in the Not-Deployed state.

    Added Nodes

Start the Azure nodes

When you want to use the cluster resources in Azure, use HPC Cluster Manager to start (provision) the Azure nodes and bring them online.

  1. In HPC Cluster Manager, click Node Management (called Resource Management in current versions of HPC Pack), and select the Azure nodes.

  2. Click Start, and then click OK.

    Start Nodes

    The nodes transition to the Provisioning state. View the provisioning log to track the provisioning progress.

    Provision Nodes

  3. After a few minutes, the Azure nodes finish provisioning and are in the Offline state. In this state, the role instances are running but cannot yet accept cluster jobs.

  4. To confirm that the role instances are running, in the Azure portal, click Cloud Services (classic) > your_cloud_service_name.

    You should see two HpcWorkerRole instances (nodes) running in the service. HPC Pack also automatically deploys two HpcProxy instances (size Medium) to handle communication between the head node and Azure.

    Running Instances

  5. To bring the Azure nodes online to run cluster jobs, select the nodes, right-click, and then click Bring Online.

    Offline Nodes

    HPC Cluster Manager indicates that the nodes are in the Online state.

Run a command across the cluster

To check the installation, use the HPC Pack clusrun command to run a command or application on one or more cluster nodes. As a simple example, use clusrun to get the IP configuration of the Azure nodes.

  1. On the head node, open a command prompt as an administrator.

  2. Type the following command:

    clusrun /nodes:azurecn* ipconfig

  3. If prompted, enter your cluster administrator password. You should see command output similar to the following.

    Clusrun

Run a test job

Now submit a test job that runs on the hybrid cluster. This example is a simple parametric sweep job (a type of intrinsically parallel computation). This example runs subtasks that add an integer to itself by using the set /a command. All the nodes in the cluster contribute to finishing the subtasks for integers from 1 to 100.

  1. In HPC Cluster Manager, click Job Management > New Parametric Sweep Job.

    New Job

  2. In the New Parametric Sweep Job dialog box, in Command line, type set /a *+* (overwriting the default command line that appears). Leave default values for the remaining settings, and then click Submit to submit the job.

    Parametric Sweep

  3. When the job is finished, double-click the My Sweep Task job.

  4. Click View Tasks, and then click a subtask to view the calculated output of that subtask.

    Task Results

  5. To see which node performed the calculation for that subtask, click Allocated Nodes. (Your cluster might show a different node name.)

    Task Results

Stop the Azure nodes

After you try out the cluster, stop the Azure nodes to avoid unnecessary charges to your account. This step stops the cloud service and removes the Azure role instances.

  1. In HPC Cluster Manager, in Node Management (called Resource Management in previous versions of HPC Pack), select both Azure nodes. Then, click Stop.

    Stop Nodes

  2. In the Stop Windows Azure nodes dialog box, click Stop.

  3. The nodes transition to the Stopping state. After a few minutes, HPC Cluster Manager shows that the nodes are Not-Deployed.

    Not Deployed Nodes

  4. To confirm that the role instances are no longer running in Azure, in the Azure portal, click Cloud services (classic) > your_cloud_service_name. No instances are deployed in the production environment.

    This completes the tutorial.

Enable auto grow shrink for Azure worker role nodes

If you don't want to start and stop the azure worker role nodes manually, you could enable Auto Grow shrink for starting the nodes automatically when there are jobs in the queue and stopping the nodes automatically when it is idle.

To enable the AutoGrowShrink property

Set-HpcClusterProperty –EnableGrowShrink 1

After enabling this property, you could submit the job again and check whether the azure worker role nodes are started automatically. Open HPC Cluster Manager and go to Resource management Pane, select the operation>AzureOperations view, you shall see all azure grow shrink operations. Please also check hpcpack auto grow shrink for more details.

Next steps