Skip Navigation

Azure Databricks – Delta in preview, 9 regions added, and other exciting announcements

Posted on September 24, 2018

Principal PM Manager, Azure Data

Azure Databricks provides a fast, easy, and collaborative Apache® Spark™-based analytics platform to accelerate and simplify the process of building big data and AI solutions that drive the business forward, all backed by industry leading SLAs.

Since announcing general availability in March, we have been continuously listening to customers and adding functionality to the Azure Databricks service. Today, I am excited to announce several new updates to Azure Databricks.

General availability

Azure Databricks is now available in Japan, Canada, India, and Australia Central

We are excited to announce the general availability of Azure Databricks in additional regions – Japan, Canada, India, Australia Central, and Australia Central 2. These additional locations bring the product worldwide availability count to 24 regions backed by a 99.95 percent SLA.

We want to ensure that we build our cloud infrastructure to serve the needs of customers by driving innovation and making it accessible globally. Stay updated with the region availability for Azure Databricks.

Organizations also benefit from Azure Databricks' native integration with other services like Azure Blob Storage, Azure Data Factory, Azure SQL Data Warehouse, and Azure Cosmos DB. This enables new analytics solutions that support modern data warehousing, advanced analytics, and real-time analytics scenarios.

Azure Active Directory conditional access in Azure Databricks

Azure Databricks now supports Azure Active Directory (AD) conditional access, which allows administrators to control where and when users are permitted to sign in to Azure Databricks.

Security is a top concern for organizations using the cloud. A key aspect of cloud security is identity and access when it comes to managing your cloud resources. In a mobile-first, cloud-first world, users can access your organization's resources using a variety of devices and apps from anywhere. As a result of this, just focusing on who can access a resource is not sufficient anymore. To master the balance between security and productivity, you also need to factor how a resource is accessed into an access control decision. With Azure Active Directory conditional access, you can address this requirement. Conditional access is a capability of Azure Active Directory. With conditional access, you can implement automated access control decisions for accessing your cloud apps that are based on conditions.

Customers can start taking advantage of Azure Active Directory conditional access in Azure Databricks today by creating a new conditional access policy in Azure AD through the portal. Read more about Azure AD conditional access in the Azure Databricks documentation.

Preview

Azure Databricks Delta

Azure Databricks Delta, available in preview today, is a powerful transactional storage layer built on Apache Spark to provide better consistency of data and faster read access.

With customers continuing to build complex pipelines for both batch and streaming data, there is a need to simplify the ETL pipelines. To build a consistent view, customers often create multiple stages in their pipeline to accommodate for evolving schemas and also to support lambda patterns with different stages for batch and stream processing.

Azure Databricks Delta can be used with Spark tables to allow for multiple users or jobs to simultaneously modify a dataset and see consistent views, this can be done without interfering with other jobs reading the same dataset from the table. Azure Databricks Delta leverages parquet files, but maintains a transaction log which allows for better file management by organizing data into large files which can be ready much more efficiently. It also has built in statistics that improve the performance by leveraging data skipping to avoid reading irrelevant information.

Azure Databricks Delta is available in preview today, you can read more about it in our documentation and import our quickstart notebook.

Azure Databricks supports Azure SQL Data Warehouse as Streaming Sink

We are happy to announce Azure Databricks users can directly stream data into Azure SQL Data Warehouse using the Structured Streams. This enables customers to visualize and report on near real-time data in SQL DW backed by real time streaming pipelines built with Structured Streams, resulting in faster decision making across the enterprise.

With explosive growth in the volume of data being analyzed, the proliferation of different data types, and the need for real-time analytics, there is a need for a single hub to visualize all your data. Azure SQL Data Warehouse (SQL DW) is a cloud-based enterprise data warehouse that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data. The data warehouse can act as the single version of truth your business can count on for visualizations and insights.

Secret Scopes with Azure Key Vault support with Azure Databricks

Azure Databricks comes built in with the ability to connect to Azure Data Lake Storage, Cosmos DB, SQL DW, Event Hubs, IoT Hubs, and several other services. We now have the ability to allow customers to store connection strings or secrets in the Azure Key Vault.

Azure Key Vault can help you securely store and manage application secrets reducing the chances of accidental loss of security information by centralizing the storage of secrets.

When using Key Vault with Azure Databricks to create secret scopes, data scientists and developers no longer need to store security information such as SAS tokens or connections strings in their notebooks. Access to a key vault requires proper authentication and authorization before a user can get access. Authentication establishes the identity of the user, while authorization determines the operations that they are allowed to perform.

With this, Azure Databricks now supports two types of secret scopes – Azure Key Vault-backed and Databricks-backed. Learn more about Azure Key Vault-backed secret scope.

Spark + AI Summit, Europe

Microsoft will have a major presence at Spark + AI Summit Europe, 2018, the premier event for the Apache Spark community. Rohan Kumar, Corporate Vice President of Azure Data, will deliver a keynote on how Azure Databricks combines the best of the Apache® Spark™ analytics platform and Microsoft Azure Data services to help customers unleash the power of data and reimagine possibilities that make AI possible and improve our world. At Spark + AI Summit, we have a number of sessions showcasing the great work our customers and partners are doing and how Azure Databricks is helping them achieve productivity at scale.

Get started today!

We are excited for you to try Azure Databricks! Get started today and let us know your feedback.