Skip to main content

Apache Spark jobs gain up to nine times speed improvement with HDInsight IO Cache

Published date: October 31, 2018

HDInsight IO Cache is now available in preview on the latest Azure HDInsight Apache Spark clusters. Once enabled, it improves the performance of Spark jobs in a completely transparent manner without any changes to the jobs required and can provide up to nine times improvement in query run time. This provides an excellent cost-to-performance ratio of cloud-based Spark deployments. HDInsight IO Cache is a new transparent data caching feature, based on RubiX, which uses recent advances in SSD technology to make explicit memory management unnecessary and allows for optimal resource utilization to enhance performance. 

Learn more

  • Services