Accelerate analytics and AI workloads with Photon powered Delta Engine on Azure Databricks
Published date: September 22, 2020
Today we are announcing the preview of Photon powered Delta engine on Azure Databricks – fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture along with Delta Lake to enhance Apache Spark 3.0’s performance by up to 20x. As organizations worldwide embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly analyze massive amounts and types of data. However, this has been a challenge. While storage and network performance have increased 10x, CPU processing speeds have only increased marginally. This leads to the question, if CPUs have become the bottleneck, how can we achieve the next level of performance? The answer with Photon lies in greater parallelism of CPU processing at the both the data-level and instruction-level. Photon powered Delta Engine is a 100% Apache Spark-compatible vectorized query engine designed to take advantage of modern CPU architecture for extremely fast parallel processing of data. Written from the ground up in C++ to take advantage of modern hardware and capitalize on data-level and CPU instruction-level parallelism, this engine optimizes text processing and regular expressions to enable fast performance on real world data and applications. It is fully compatible with Apache Spark™ APIs to ensure workloads run seamlessly without code changes. Azure Databricks was already blazing fast compared to Apache Spark and now, Photon powered Delta Engine enables even faster performance for modern analytics and AI workloads on Azure. We ran a 30TB TPC Benchmark DS (TPC-DS), industry standard benchmark test to measure the processing speed and found the Photon powered Delta Engine to be 20x faster than Spark 2.4.
Read the blog to learn more.