U-SQL, the new big data language for Azure Data Lake

To help you use the Azure Data Lake as productively as possible, we will have a six-part blogging series on different aspects of the Azure Data Lake over the next few days. This is the second blog in the series giving you the details of U-SQL, the new language we introduced for Azure Data Lake Analytics.

Co-authored by Nishant Thacker.

Yesterday, we announced Azure Data Lake Analytics and Azure Data Lake Store are in public preview. To help you use the Azure Data Lake as productively as possible, we will have a six-part series on different aspects of the Azure Data Lake over the next few days. This is the second blog in the series giving you the details of U-SQL, the new language we introduced for Azure Data Lake Analytics.

U-SQL is a language that unifies the benefits of SQL with the expressive power of your own code. U-SQL’s scalable distributed query capability allows you to efficiently analyze data in the store and across relational stores such as Azure SQL Database. This blog post will outline basics of U-SQL. The next post will go into even more details of how to develop using U-SQL.

As you approach U-SQL for the first time, you will notice it's a language you’ll be comfortable with from Day One. The syntax is based on T-SQL while it uses C# types as default. This allows you to easily conceptualize how data will be processed while writing queries, and doesn’t scare you with new frameworks or concepts. Essentially, it abstracts the deeper concepts of parallelism and distributed processing so you don’t need to worry about them while writing your queries. You don’t need special programming skills or months of training to be able to deliver. Rather, just a good understanding of SQL and knowledge of C#.

U-SQL allows you to process any type of data. From analyzing BotNet attack patterns to security logs and extracting features from images or videos for machine learning, the language enables you to work with any data.

U-SQL integrates custom code seamlessly to allow you to express your complex, often proprietary business algorithms. Different use cases like processing different file types and encryption processes may require custom processing, often not easily expressed in standard query languages, ranging from user-defined functions, to custom input and output formats. This is something that U-SQL excels at.

Finally, U-SQL was developed to efficiently scale to any size of data without you focusing on scale-out topologies, plumbing code or limitations of a specific distributed infrastructure. Again, no restriction to the total data size or individual units of data that can be processed and it automatically scales to utilize available resources. Let developers concentrate on business logic to be implemented and not on infrastructure that needs to be setup to process their queries on massive amounts of data.

Let’s take a sneak peek in to see what U-SQL looks like. A typical U-SQL query would be something like the one below:

@Result =

SELECT country, city, COUNT(*) AS NumberOfDrivers

FROM @Drivers

GROUP BY country, city

ORDER BY NumberOfDrivers DESC, country, city

FETCH FIRST 10 ROWS;

As can be seen, U-SQL leverages a syntax a DBA would understand with typical SELECT, FROM, GROUP BY clauses. Since this is meant to work on massive data, it provisions the FETCH clause which allows the developer to preview some data for analysis of the query results. Here’s the rowset @Drivers being extracted:

@Drivers =

EXTRACT driver_id int

, name string

, street string

, city string

, region string

, zipcode string

, country string

, phone_numbers string

FROM @INPUT_DRIVERS

USING Extractors.Text(delimiter : 't', quoting: true, encoding : Encoding.Unicode);

As we see, it extracts a set of fields from a file using a text extractor, which can be customized to suit any format and extensible for any type of data or delimiters etc.

Once you have the result set from the first query, you can also use Outputters to go save them in the format of your choice:

OUTPUT @Result

TO @OUTPUT

USING Outputters.Csv(quoting : true);

This is probably the simplest example of U-SQL. However, U-SQL includes many more capabilities like the following:

Operating over set of files with patterns
Using (Partitioned) Tables
Federated Queries against Azure SQL DB
Encapsulating your U-SQL code with Views, Table-Valued Functions and Procedures
SQL Windowing Functions
Programming with C# User-defined Operators (custom extractors, processors)
Complex Types (MAP, ARRAY)
Using U-SQL in data processing pipelines
U-SQL in a lambda architecture for IOT analytics

To learn more about U-SQL, watch the video below and stay tuned for the next blog post.

Where can I get more information?

Read the announcement post more details.
Check out the Visual Studio’s U-SQL post to learn more about the new big data language.
Visit Azure.com Data Lake solution page.
Watch a video about U-SQL.
Watch the Azure Data Lake Video Series.

Where can I get more information?

Documentation and How-To's

Transform your business with Microsoft’s unrivaled end-to-end data platform

GAIA-X gets new support with European Eclipse Data Connector

4 common analytics scenarios to build business agility

Explore
Azure AI solutions

Where can I get more information?

Documentation and How-To's

ExploreAzure AI solutions

Explore
Azure AI solutions