Microsoft open sources Data Accelerator, an easy-to-configure pipeline for streaming at scale

Microsoft open sources Data Accelerator, an easy-to-configure pipeline for streaming at scale • 1 min read

Posted on April 18, 2019
1 min read

This blog post was co-authored by Dinesh Chandnani, Principal Group Engineering Manager, Microsoft.

Standing up a data pipeline for the first time can be a challenge and decisions you make at the start of a project can limit your choices long after the initial deployment has been rolled out. Often what is needed is a playground in which to learn about and evaluate the available options and capabilities in the solution space. To that end, we are excited to be announcing that an internal Microsoft project known as Data Accelerator is now being open sourced.

Data Accelerator started in 2017 as a large-scale data processing project in Microsoft’s Developer Division that eventually landed on streaming on Apache Spark for reasons of scale and speed. The pipeline today operates at Microsoft scale.

Some of the reasons we think it will have value to the wider community:

Fast Dev-Test loop: Events can be sampled to support local execution of queries, short circuiting the wait and delay of submitting your job to the cluster for it to fail seven minutes later due to a misplaced semicolon.
One-box deployment for local testing and discovery: Learn before you commit to a prototype.
Designer-based rules and query building: Stand up an end-to-end ETL pipeline without writing any code or dig right into the details.
Time-windowing, reference data, and output capabilities added to SQL-Spark syntax: Keyword extensions to SQL-Spark syntax avoid the complexity and error-prone management of these common tasks.

The Developer Division of Microsoft is using Data Accelerator in production every day and will continue to make improvements in the toolchain over time, but we recognize the toolset could do many more things given the need. We hope that by opening this project some of you will find Data Accelerator even more helpful.

To learn more about the open sourcing of Data Accelerator visit the announcement on the Open Source blog.

Microsoft open sources Data Accelerator, an easy-to-configure pipeline for streaming at scale

Explore

Related posts

Study showcases how Microsoft Dev Box impacts developer productivity

Modernize and build intelligent apps with support from Microsoft partner solutions

Navigate a seamless cloud modernization with Microsoft assessment tools

Advancing Microsoft Azure resilience with Chaos Studio

Popular

AI + machine learning

Analytics

Compute

Containers

Databases

DevOps

Developer tools

Hybrid + multicloud

Identity

Integration

Internet of Things

Management and governance

Media

Migration

Mixed reality

Mobile

Networking

Security

Storage

Web

Virtual desktop infrastructure

Use cases

Application development

AI

Cloud migration and modernization

Data and analytics

Hybrid cloud and infrastructure

Internet of Things

Security and governance

Organization type

Resources

Explore

Related posts