Geoff Staneff joins Donovan Brown to show how Data Accelerator for Apache Spark simplifies everything from onboarding to streaming of big data. It offers a rich, easy-to-use experience for creating, editing, and managing Apache Spark jobs on Azure HDInsight while enabling the full power of the Apache Spark engine.Microsoft open sources Data Accelerator for Apache SparkData Accelerator for Apache Spark (microsoft/data-accelerator) on GitHubAzure HDInsight overviewAzure HDInsight docsCreate a free account (Azure)
Azure operates one of the largest public big data cluster services on the planet. Every day thousands of customers build and operate mission-critical big data analytics, business intelligence (BI), and machine learning (ML) solutions using Azure HDInsight. Come explore architectural best practices, recommended patterns, and tips and tricks for building successful production systems using Apache Spark, Hive, Kafka, and HBase. We will also showcase customer examples and reference architectures.
Democratizing data empowers customers by enabling more and more users to gain value from data through self-service analytics. Processing raw data for building apps and gaining deeper insights is one of the critical tasks when building your modern data warehouse architecture. In this session, we will show you how to build data pipelines with Spark and your favorite .NET programming language (C#, F#) using both Azure HDInsight and Azure Databricks, and connect them to Azure SQL Data Warehouse for reporting and consumption.
Kafka is one of the most used OSS for large enterprises — and a defacto messaging bus. In this session, we’ll walk through the Kafka use case with Lambda Architecture for down streaming and processing of messages at scale We’ll share what we’ve learned working with customers and how they use Kafka on Azure to solve their business problems.
Maxim Lukiyanov and Scott Hanselman discuss intricate ways in which Apache Spark jobs can fail in production and how new diagnostics tools, now available in Azure HDInsight, visualize these problems in a new intuitive way and help discover and understand them from the first glance.For more information:Spark Debugging and Diagnostics Toolset for Azure HDInsight (blog post)Apache Spark for Azure HDInsight overviewCreate a free account (Azure)
Dhruv Goel and Scott Hanselman discuss why enterprise customers trust Apache Kafka on Azure HDInsight with their streaming ingestion needs. Get even more control over the security of your data at rest with Bring-Your-Own-Key encryption for Kafka. With Azure HDInsight, you get the best of open source and the security and reliability of a managed platform.For more information:Bring your own key for Apache Kafka on Azure HDInsight (Preview)Azure HDInsight - Hadoop, Spark, and Kafka ServiceAzure HDInsight pricingCreate a free account (Azure)
Dhruv Goel and Scott Hanselman discuss why enterprise customers trust Apache Kafka on Azure HDInsight with their streaming ingestion needs. Integrate Kafka with Azure Active Directory for authentication and set up fine-grained access control with Apache Ranger to let multiple users access Kafka easily and securely. With Azure HDInsight, you get the best of open source on a managed platform.For more information, see:Tutorial: Configure Kafka policies in HDInsight with Enterprise Security Package (Preview)Azure HDInsight - Hadoop, Spark, and Kafka ServiceAzure HDInsight pricingCreate a free account (Azure)
Kafka on Azure HDInsight is an enterprise-grade streaming ingestion service that allows you to quickly and easily setup, use, scale and monitor your Kafka clusters in the cloud. Kafka provides a fault tolerant, distributed pub sub model to enable real-time solutions such as Internet of Things (IoT), fraud detection, clickstream analysis, financial alerts, and social analytics.
Before you can have Big Data, you must collect the data. There are two popular ways to do this: with batches and with live streams. Apache Kafka has changed the way we look at streaming and logging data and now Azure provides tools and services for streaming data into your Big Data pipeline in Azure. This session will outline the different services in the Big Data Streaming ecosystem in Azure, how they work together, and when to use which including HDInsight Kafka and Event Hubs. We will also talk briefly about when using traditional ETL tools is a better idea.
In this session, you will learn how technologies such as Low Latency Analytical Processing [LLAP] and Hive 2.x are making it possible to analyze petabytes of data with sub second latency with common file formats such as csv, json etc. without converting to columnar file formats like ORC/Parquet. We will go deep into LLAP’s performance and architecture benefits and how it compares with Spark and Presto. We also look at how business analysts can use familiar tools such as Microsoft Excel and Power BI and do interactive query over their data lake without moving data outside the data lake.
Get $200 in Azure credits and 12 months of popular services—freeStart free
Subscribers get up to $1800 per year of Azure servicesActivate now
Join Microsoft for Startups and get free Azure servicesLearn more