Drive higher utilization of Azure HDInsight clusters with autoscale

Veröffentlicht am 21 Mai, 2019

Principal Program Manager, Azure HDInsight

We are excited to share the general availability of the Autoscale feature for Azure HDInsight. This feature enables enterprises to become more productive and cost-efficient by automatically scaling clusters up or down based on the load or a customized schedule. 

Let’s consider the scenario of a U.S. based health provider who is using Azure HDInsight to build a unified big data platform at corporate level to process various data for trend prediction or usage pattern analysis. To achieve their business goals, they operate multiple HDInsight clusters in production for real-time data ingestion, batch and interactive analysis.

Some clusters are customized to exact requirements, such as ISV/line of business applications and access control policies, which are subject to rigorous SLA requirements. Sizing such clusters is a hard problem by itself and operating them 24/7 at peak capacity is expensive. So once the clusters are created, IT admins either need to manually monitor the dynamic capacity requirements, scale the clusters up and down, or develop custom tools to do the same. These challenges prevent IT admins from being as productive as possible when building and operating cost-efficient big data analytics workloads.

With the new cluster Autoscaling feature, IT admins can have the Azure HDInsight service automatically monitor and scale the cluster up or down between a admin specified minimum and maximum number of nodes based on either actual load on the cluster or a customized schedule. IT admins can flexibly adjust the cluster size range or the schedule as the unique requirements of their workloads change. The autoscale feature releases IT admins from having to build complex monitoring tools or worrying about wasted resources and high costs.

Benefits

Automatically make scaling decisions

Once autoscale is enabled, you can rest assured that the service will take care of your cluster size.

  • In the load based mode: The cluster size will be scaled up exactly to how much more resources is needed by your applications, but never goes beyond the maximum number you set. Similarly, the cluster size will be scaled down to the minimum to meet your current resource requirements, but never goes below the minimum number of worker nodes you set.
  • In the schedule based mode: Cluster size will be scaled up and down based on the predefined schedule.  

All the above benefits release IT admins from worrying about wasted resources and allow enterprise to be cost effective and productive.

Pay for only what you need

autoscale helps you achieve the balance between performance and cost efficiency. Scaling up the cluster lets you derive the business insight you need on time while scaling down the cluster removes the excess resources. Ultimately, autoscale leads to higher utilization enabling you to pay for only what you need.

Customize to your own scenario

HDInsight autoscale allows you to customize the scaling strategy based on your own scenario. In the load based mode, you can define the maximum and minimum based on your cost requirements. In the schedule based mode, you can define a schedule for each weekday to meet your own business objectives.

Monitor scaling history easily

The autoscale feature gives you full visibility in to how the cluster has been scaled up or down. This enables you to further optimize the autoscale configuration for higher utilization and workload performance.

Using the Azure portal, you can zoom in and out to check the cluster size over the past 90 days.

All the scaling events are also available in Azure Log Analytics. You can run queries to get all the details including when the scaling operation took place, how much resources were needed and how many worker nodes it scaled to. 

Support multiple workloads

HDInsight Autoscale is supported in Spark and Hadoop (Hive) clusters as a generally available feature. Meanwhile you can also enable autoscale for HBase and LLAP clusters which is in preview right now. 

Get started