Questions? Feedback? powered by Olark live chat software
Hopp over navigasjon

U-SQL, the new big data language for Azure Data Lake

Posted on 29 oktober, 2015

Product Marketing, Hadoop/Big Data and Data Warehousing

Co-authored by Nishant Thacker.

 

Yesterday, we announced Azure Data Lake Analytics and Azure Data Lake Store are in public preview. To help you use the Azure Data Lake as productively as possible, we will have a six-part series on different aspects of the Azure Data Lake over the next few days. This is the second blog in the series giving you the details of U-SQL, the new language we introduced for Azure Data Lake Analytics.

U-SQL is a language that unifies the benefits of SQL with the expressive power of your own code. U-SQL’s scalable distributed query capability allows you to efficiently analyze data in the store and across relational stores such as Azure SQL Database. This blog post will outline basics of U-SQL. The next post will go into even more details of how to develop using U-SQL.

clip_image001

As you approach U-SQL for the first time, you will notice it's a language you’ll be comfortable with from Day One. The syntax is based on T-SQL while it uses C# types as default. This allows you to easily conceptualize how data will be processed while writing queries, and doesn’t scare you with new frameworks or concepts. Essentially, it abstracts the deeper concepts of parallelism and distributed processing so you don’t need to worry about them while writing your queries. You don’t need special programming skills or months of training to be able to deliver. Rather, just a good understanding of SQL and knowledge of C#.

U-SQL allows you to process any type of data. From analyzing BotNet attack patterns to security logs and extracting features from images or videos for machine learning, the language enables you to work with any data.

U-SQL integrates custom code seamlessly to allow you to express your complex, often proprietary business algorithms. Different use cases like processing different file types and encryption processes may require custom processing, often not easily expressed in standard query languages, ranging from user-defined functions, to custom input and output formats. This is something that U-SQL excels at.

Finally, U-SQL was developed to efficiently scale to any size of data without you focusing on scale-out topologies, plumbing code or limitations of a specific distributed infrastructure. Again, no restriction to the total data size or individual units of data that can be processed and it automatically scales to utilize available resources. Let developers concentrate on business logic to be implemented and not on infrastructure that needs to be setup to process their queries on massive amounts of data.

Let’s take a sneak peek in to see what U-SQL looks like. A typical U-SQL query would be something like the one below:

@Result =

SELECT country, city, COUNT(*) AS NumberOfDrivers

FROM @Drivers

GROUP BY country, city

ORDER BY NumberOfDrivers DESC, country, city

FETCH FIRST 10 ROWS;

As can be seen, U-SQL leverages a syntax a DBA would understand with typical SELECT, FROM, GROUP BY clauses. Since this is meant to work on massive data, it provisions the FETCH clause which allows the developer to preview some data for analysis of the query results. Here’s the rowset @Drivers being extracted:

@Drivers =

EXTRACT driver_id int

, name string

, street string

, city string

, region string

, zipcode string

, country string

, phone_numbers string

FROM @INPUT_DRIVERS

USING Extractors.Text(delimiter : '\t', quoting: true, encoding : Encoding.Unicode);

As we see, it extracts a set of fields from a file using a text extractor, which can be customized to suit any format and extensible for any type of data or delimiters etc.

Once you have the result set from the first query, you can also use Outputters to go save them in the format of your choice:

OUTPUT @Result

TO @OUTPUT

USING Outputters.Csv(quoting : true);

This is probably the simplest example of U-SQL. However, U-SQL includes many more capabilities like the following:

  • Operating over set of files with patterns
  • Using (Partitioned) Tables
  • Federated Queries against Azure SQL DB
  • Encapsulating your U-SQL code with Views, Table-Valued Functions and Procedures
  • SQL Windowing Functions
  • Programming with C# User-defined Operators (custom extractors, processors)
  • Complex Types (MAP, ARRAY)
  • Using U-SQL in data processing pipelines
  • U-SQL in a lambda architecture for IOT analytics

To learn more about U-SQL, watch the video below and stay tuned for the next blog post.

clip_image003

Where can I get more information?

Documentation and How-To's