Azure Data Factory February new features update

Postado em 8 março, 2017

Senior Program Manager

Azure Data Factory allows you to bring data from a rich variety of locations in diverse formats into Azure for advanced analytics and predictive modeling on top of massive amounts of data. We have been listening to your feedback and strive to continuously introduce new features and fixes to support more data ingest and transformation scenarios. Moving to the new year, we would like to start a monthly feature summary blog series so our users can easily keep track of new feature details and use them right away.

Here is a complete list of the Azure Data Factory updates for February. We will go through them one by one in this blog post.

  • New Oracle driver bundled with Data Management Gateway with performance enhancements
  • Service Principal authentication support for Azure Data Lake Store
  • Automatic table schema creation when loading into SQL Data Warehouse
  • Zip compression/decompression support
  • Support extracting data from arrays in JSON files
  • Ability to explicitly specify cloud copy execution location
  • Support updating the new Azure Resource Manager Machine Learning web service

New Oracle driver bundled with Data Management Gateway with performance enhancements

Introduction: Previously, to connect to Oracle data source through Data Management Gateway users were required to install the Oracle provider separately, causing them to run into different issues. Now, with the Data Management Gateway version 2.7 update, a new Microsoft driver for Oracle is installed so no separate Oracle driver installation is required. The new bundled driver providers better load throughput, with some customers observing 5x-8x performance increase. Refer to Oracle connector documentation page for details.

Configuration: The Data Management Gateway periodically checks for updates. You can check its version from the Help page as shown below. If you are running a version lower than v2.7, you can get update directly from the Download Center. With Data Management Gateway version 2.7, the new driver will be used automatically in Copy Wizard when Oracle is being used as source. Learn more about Oracle linked service properties.

gatewayversion

Service Principal authentication support for Azure Data Lake Store

Introduction: In addition to the existing user credential authentication, Azure Data Factory now supports Service Principal to access the Azure Data Lake Store. The token used in the previous user credential authentication mode could expire after 12 hours to 90 days, so periodically reauthorizing the token manually or programmatically is required for scheduled pipelines. Learn more about the token expiration of data moving from Azure Data Lake Store using Azure Data Factory. Now with the Service Principal authentication, the key expiration threshold is much longer so you are suggested to use this mechanism going forward, especially for scheduled pipelines. Learn more about the Azure Data Lake Store and Service Principal.

Configuration: In the Copy Wizard, you will see a new Authentication type option with Service Principal as default, shown below. 

serviceprincipal

Automatic table schema creation when loading into SQL Data Warehouse

Introduction: When copying data from On-Premise SQL Server or Azure SQL Database to Azure SQL Data Warehouse using the Copy Wizard, if the table does not exist in the destination SQL Data Warehouse, Azure Data Factory can now automatically create the destination table using schema from source.

Configuration: From the Copy Wizard, in the Table mapping page, you now have the option to map to existing sink tables or create new ones using source tables’ schema. Proper data type conversion may happen if needed to fix the incompatibility between source and destination stores. Users will be warned in the Schema mapping page, as shown in the second image below, about potential incompatibility issues. Learn more about Auto table creation.

autotablecreation1

 autotablecreation2

Zip compression/decompression support

Introduction: The Azure Data Factory Copy Activity can now unzip/zip your files with ZipDeflate compression type in addition to the existing GZip, BZip2, and Deflate compression support. This applies to all file-based stores, including Azure Blob, Azure Data Lake Store, Amazon S3, FTP/s, File System, and HDFS.

Configuration: You can find the option in Copy Wizard pages as shown below. Learn more from the specifying compression section in each corresponding connector topic.

zip

Extracting data from arrays in JSON files

Introduction: Now the Copy Activity supports parsing arrays in JSON files. This is to address the feedback that the entire array can only be converted to a string or skipped. You can now extract data from array or cross apply objects in array with data under root object.

Configuration: The Copy Wizard provides you with the option to choose how JSON array can be parsed as shown below. In this example, the elements in “orderlines” array are parsed as “prod” and “price” columns. For more details on configuration and examples, check the specifying JSON format section in each file-based data store topic.

json

Ability to explicitly specify cloud copy execution location

Introduction: When copying data between cloud data stores, Azure Data Factory, by default, detects the region of your sink data store and picks the geographically closest service to perform the copy. If the region is not detectable or the service that powers the Copy Activity doesn’t have a deployment available in that region, you can now explicitly set the Execution Location option to specify the region of service to be used to perform the copy. Learn more about the globally available data movement.

Note: Your data will go through that region over the wire during copy.

Configuration: Copy wizard will prompt for the Execution location option in the Summary page if you fall into the cases mentioned above.

executionlocation

Support updating the new Azure Resource Manager Machine Learning web service

Introduction: You can use the Machine Learning Update Resource Activity to update the Azure Machine Learning scoring service, as a way to operationalize the Machine Learning model retrain for scoring accuracy. Now in addition to supporting the classic web service, Azure Data Factory can support the new Azure Resource Manager based Azure Machine Learning scoring web service using Service Principal.

Configuration: The Azure Machine Learning Linked Service JSON now supports Service Principal so you can access the new web service endpoint. Learn more from scoring web service is Azure Resource Manager web service.

 

Above are the new features we introduced in February. Have more feedbacks or questions? Share your thoughts with us on Azure Data Factory forum or feedback site, we’d love to hear more from you.