Apache Spark jobs gain up to nine times speed improvement with HDInsight IO Cache
Updated: October 31, 2018
HDInsight IO Cache is now available in preview on the latest Azure HDInsight Apache Spark clusters. Once enabled, it improves the performance of Spark jobs in a completely transparent manner without any changes to the jobs required and can provide up to nine times improvement in query run time. This provides an excellent cost-to-performance ratio of cloud-based Spark deployments. HDInsight IO Cache is a new transparent data caching feature, based on RubiX, which uses recent advances in SSD technology to make explicit memory management unnecessary and allows for optimal resource utilization to enhance performance.