U-SQL, the new big data language for Azure Data Lake

By Oliver Chiu Product Marketing, Hadoop/Big Data and Data Warehousing

U-SQL, the new big data language for Azure Data Lake • 4 min read

Posted on October 29, 2015
4 min read

Co-authored by Nishant Thacker.

Yesterday, we announced Azure Data Lake Analytics and Azure Data Lake Store are in public preview. To help you use the Azure Data Lake as productively as possible, we will have a six-part series on different aspects of the Azure Data Lake over the next few days. This is the second blog in the series giving you the details of U-SQL, the new language we introduced for Azure Data Lake Analytics.

U-SQL is a language that unifies the benefits of SQL with the expressive power of your own code. U-SQL’s scalable distributed query capability allows you to efficiently analyze data in the store and across relational stores such as Azure SQL Database. This blog post will outline basics of U-SQL. The next post will go into even more details of how to develop using U-SQL.

As you approach U-SQL for the first time, you will notice it's a language you’ll be comfortable with from Day One. The syntax is based on T-SQL while it uses C# types as default. This allows you to easily conceptualize how data will be processed while writing queries, and doesn’t scare you with new frameworks or concepts. Essentially, it abstracts the deeper concepts of parallelism and distributed processing so you don’t need to worry about them while writing your queries. You don’t need special programming skills or months of training to be able to deliver. Rather, just a good understanding of SQL and knowledge of C#.

U-SQL allows you to process any type of data. From analyzing BotNet attack patterns to security logs and extracting features from images or videos for machine learning, the language enables you to work with any data.

U-SQL integrates custom code seamlessly to allow you to express your complex, often proprietary business algorithms. Different use cases like processing different file types and encryption processes may require custom processing, often not easily expressed in standard query languages, ranging from user-defined functions, to custom input and output formats. This is something that U-SQL excels at.

Finally, U-SQL was developed to efficiently scale to any size of data without you focusing on scale-out topologies, plumbing code or limitations of a specific distributed infrastructure. Again, no restriction to the total data size or individual units of data that can be processed and it automatically scales to utilize available resources. Let developers concentrate on business logic to be implemented and not on infrastructure that needs to be setup to process their queries on massive amounts of data.

Let’s take a sneak peek in to see what U-SQL looks like. A typical U-SQL query would be something like the one below:

@Result =

SELECT country, city, COUNT(*) AS NumberOfDrivers

FROM @Drivers

GROUP BY country, city

ORDER BY NumberOfDrivers DESC, country, city

FETCH FIRST 10 ROWS;

As can be seen, U-SQL leverages a syntax a DBA would understand with typical SELECT, FROM, GROUP BY clauses. Since this is meant to work on massive data, it provisions the FETCH clause which allows the developer to preview some data for analysis of the query results. Here’s the rowset @Drivers being extracted:

@Drivers =

EXTRACT driver_id int

, name string

, street string

, city string

, region string

, zipcode string

, country string

, phone_numbers string

FROM @INPUT_DRIVERS

USING Extractors.Text(delimiter : 't', quoting: true, encoding : Encoding.Unicode);

As we see, it extracts a set of fields from a file using a text extractor, which can be customized to suit any format and extensible for any type of data or delimiters etc.

Once you have the result set from the first query, you can also use Outputters to go save them in the format of your choice:

OUTPUT @Result

TO @OUTPUT

USING Outputters.Csv(quoting : true);

This is probably the simplest example of U-SQL. However, U-SQL includes many more capabilities like the following:

Operating over set of files with patterns
Using (Partitioned) Tables
Federated Queries against Azure SQL DB
Encapsulating your U-SQL code with Views, Table-Valued Functions and Procedures
SQL Windowing Functions
Programming with C# User-defined Operators (custom extractors, processors)
Complex Types (MAP, ARRAY)
Using U-SQL in data processing pipelines
U-SQL in a lambda architecture for IOT analytics

To learn more about U-SQL, watch the video below and stay tuned for the next blog post.

Where can I get more information?

Read the announcement post more details.
Check out the Visual Studio’s U-SQL post to learn more about the new big data language.
Visit Azure.com Data Lake solution page.
Watch a video about U-SQL.
Watch the Azure Data Lake Video Series.

U-SQL, the new big data language for Azure Data Lake

Where can I get more information?

Documentation and How-To's

Explore

Related posts

Enabling Diagnostic Logging in Azure API for FHIR®

Conformité avec le niveau de classification IRAP « Protected » de la couche infra à la couche d'application SAP sur Azure

MileIQ and Azure Event Hubs: Billions of miles streamed

Azure Stack IaaS – part ten

Join the conversation

Sélection

IA + Machine Learning

Analyse

Calcul

Conteneurs

Bases de données

DevOps

Outils de développement

Hybride + multicloud

Identité

Intégration

Internet des Objets

Gestion et gouvernance

Données multimédias

Migration

Réalité mixte

Mobile

Mise en réseau

Sécurité

Stockage

Web

Bureau virtuel Windows

Cas d'utilisation

Développement d’applications

IA

Migration et modernisation cloud

Données et analyse

Cloud hybride et infrastructure

Internet des Objets

Sécurité et gouvernance

Type d’organisation

Ressources

Where can I get more information?

Documentation and How-To's

Explore

Related posts

Join the conversation