Azure Databricks, industry-leading analytics platform powered by Apache Spark™

22 Mart, 2018 tarihinde gönderildi

Corporate Vice President, Azure Data

This blog post was co-authored by Ali Ghodsi, CEO, Databricks.

The confluence of cloud, data, and AI is driving unprecedented change. The ability to utilize data and turn it into breakthrough insights is foundational to innovation today. Our goal is to empower organizations to unleash the power of data and reimagine possibilities that will improve our world.

To enable this journey, we are excited to announce the general availability of Azure Databricks, a fast, easy, and collaborative Apache® Spark™-based analytics platform optimized for Azure.

Fast, easy, and collaborative

Over the past five years, Apache Spark has emerged as the open source standard for advanced analytics, machine learning, and AI on Big Data. With a massive community of over 1,000 contributors and rapid adoption by enterprises, we see Spark’s popularity continue to rise.

Azure Databricks is designed in collaboration with Databricks whose founders started the Spark research project at UC Berkeley, which later became Apache Spark. Our goal with Azure Databricks is to help customers accelerate innovation and simplify the process of building Big Data & AI solutions by combining the best of Databricks and Azure.

To meet this goal, we developed Azure Databricks with three design principles.

First, enhance user productivity in developing Big Data applications and analytics pipelines. Azure Databricks’ interactive notebooks enable data science teams to collaborate using popular languages such as R, Python, Scala, and SQL and create powerful machine learning models by working on all their data, not just a sample data set. Native integration with Azure services further simplifies the creation of end-to-end solutions. These capabilities have enabled companies such as renewables.AI to boost the productivity of their data science teams by over 50 percent.

“Instead of one data scientist writing AI code and being the only person who understands it, everybody uses Azure Databricks to share code and develop together.”

- Andy Cross, Director, renewables.AI

Second, enable our customers to scale globally without limits by working on big data with a fully managed, cloud-native service that automatically scales to meet their needs, without high cost or complexity. Azure Databricks not only provides an optimized Spark platform, which is much faster than vanilla Spark, but it also simplifies the process of building batch and streaming data pipelines and deploying machine learning models at scale. This makes the analytics process faster for customers such as E.ON and Lennox International enabling them to accelerate innovation.

“Every day, we analyze nearly a terabyte of wind turbine data to optimize our data models. Before, that took several hours. With Microsoft Azure Databricks, it takes a few minutes. This opens a whole range of possible new applications.”

-  Sam Julian, Product Owner, Data Services, E.ON

“At Lennox International, we have 1000’s of devices streaming data back into our IoT environment. With Azure Databricks, we moved from 60% accuracy to 94% accuracy on detecting equipment failures. Using Azure Databricks has opened the flood gates to all kinds of new use cases and innovations. In our previous process, 15 devices, which created 2 million records, took 6 hours to process. With Azure Databricks, we are able to process 25,000 devices – 10 billion records – in under 14 minutes.”

- Sunil Bondalapati, Director of Information Technology, Lennox International

Third, ensure that we provide our customers with the enterprise security and compliance they have come to expect from Azure. Azure Databricks protects customer data with enterprise-grade SLAs, simplified security and identity, and role-based access controls with Azure Active Directory integration. As a result, organizations can safeguard their data without compromising productivity of their users.

Azure is the best place for Big Data & AI

We are excited to add Azure Databricks to the Azure portfolio of data services and have taken great care to integrate it with other Azure services to unlock key customers scenarios.

High-performance connectivity to Azure SQL Data Warehouse, a petabyte scale, and elastic cloud data warehouse allows organizations to build Modern Data Warehouses to load and process any type of data at scale for enterprise reporting and visualization with Power BI. It also enables data science teams working in Azure Databricks notebooks to easily access high-value data from the warehouse to develop models.

Integration with Azure IoT Hub, Azure Event Hubs, and Azure HDInsight Kafka clusters enables enterprises to build scalable streaming solutions for real-time analytics scenarios such as recommendation engines, fraud detection, predictive maintenance, and many others.

Integration with Azure Blob Storage, Azure Data Factory, Azure Data Lake Store, Azure SQL Data Warehouse, and Azure Cosmos DB allows organizations to use Azure Databricks to clean, join, and aggregate data no matter where it sits.

We are committed to making Azure the best place for organizations to unlock the insights hidden in their data to accelerate innovation. With Azure Databricks and its native integration with other services, Azure is the one-stop destination to easily unlock powerful new analytics, machine learning, and AI scenarios.

Architecture blog v2

Get started today!

We are excited for you to try Azure Databricks! Get started today and let us know your feedback.