Hello, everyone! In March, we added a lot of great new capabilities to Azure Data Factory, including high demanding features like loading data from SAP HANA, SAP Business Warehouse (BW) and SFTP, performance enhancement of directly loading from Data Lake Store into SQL Data Warehouse, data movement support for the first region in the UK (UK South), and a new Spark activity for rich data transformation. We can’t wait to share more details with you, following is a complete list of Azure Data Factory March new features:
- Support data loading from SAP HANA and SAP DW
- Support data loading from SFTP
- Performance enhancement of direct loading from Data Lake Store to Azure SQL Data Warehouse via PolyBase
- Spark activity for rich data transformation
- Max allowed cloud Data Movement Units increase
- UK data center now available for data movement
Support data loading from SAP HANA and SAP Business Warehouse
SAP is one of the most widely-used enterprise softwares in the world. We hear you that it’s crucial for Microsoft to empower customers to integrate their existing SAP system with Azure to unlock business insights. We are happy to announce that we have enabled loading data from SAP HANA and SAP Business Warehouse (BW) into various Azure data stores for advanced analytics and reporting, including Azure Blob, Azure Data Lake, and Azure SQL DW, etc.
- The SAP HANA connector supports copying data from HANA information models (such as Analytic and Calculation views) as well as Row and Column tables using SQL queries. To establish the connectivity, you need to install the latest Data Management Gateway (version 2.8) and the SAP HANA ODBC driver. Refer to SAP HANA supported versions and installation for more details.
- The SAP BW connector supports copying data from SAP Business Warehouse version 7.x InfoCubes and QueryCubes (including BEx queries) using MDX queries. To establish the connectivity, you need to install the latest Data Management Gateway (version 2.8) and the SAP NetWeaver library. Refer to SAP BW supported versions and installation for more details.
For more information about connecting to SAP HANA and SAP BW, refer to Azure Data Factory offers SAP HANA and Business Warehouse data integration.
Support data loading from SFTP
You can now use Azure Data Factory to copy data from SFTP servers into various data stores in Azure or On-Premise environments, including Azure Blob/Azure Data Lake/Azure SQL DW/etc. A full support matrix can be found in Supported data stores and formats. You can author copy activity using the intuitive Copy wizard (screenshot below) or JSON scripting. Refer to SFTP connector documentation for more details.
Performance enhancement of direct data loading from Data Lake Store to Azure SQL Data Warehouse via PolyBase
Data Factory Copy Activity now supports loading data from Data Lake Store to Azure SQL Data Warehouse directly via PolyBase. When using the Copy Wizard, PolyBase is by default turned on and your source file compatibility will be automatically checked. You can monitor whether PolyBase is used in the activity run details.
If you are currently not using PolyBase or staged copy plus PolyBase for copying data from Data Lake Store to Azure SQL Data Warehouse, we suggest checking your source data format and updating the pipeline to enable PolyBase and remove staging settings for performance improvement. For more detailed information, refer to Use PolyBase to load data into Azure SQL Data Warehouse and Azure Data Factory makes it even easier and convenient to uncover insights from data when using Data Lake Store with SQL Data Warehouse.
Spark activity for rich data transformation
Apache Spark for Azure HDInsight is built on an in-memory compute engine, which enables high performance querying on big data. Azure Data Factory now supports Spark Activity against Bring-Your-Own HDInsight clusters. Users can now operationalize Spark job executions through Spark Activity in Azure Data Factory.
Since Spark job may have multiple dependencies such as jar packages (placed in the java CLASSPATH) and python files (placed on the PYTHONPATH), you will need to follow a predefined folder structure for your Spark script files. For more detailed information about JSON scripting of the Spark Activity, refer to Invoke Spark programs from Azure Data Factory pipelines.
Max allowed cloud Data Movement Units increase
Cloud Data Movement Units (DMU) reflects the powerfulness of copy executor used to empower your cloud-to-cloud copy. To copy multiple files with large volume from Blob storage/Data Lake Store/Amazon S3/cloud FTP/cloud SFTP into Blob storage/Data Lake Store/Azure SQL Database, higher DMUs usually provide you better throughput. Now you can specify up to 32 DMUs for large copy runs. Learn more from cloud data movement units and parallel copy.
UK data center now available for data movement
Azure Data Factory data movement service is now available in the UK, in addition to the existing 16 data centers. With that, you can leverage Data Factory to copy data from Cloud and On-Premise data sources into various supported Azure data stores located in the UK. Learn more about the globally available data movement and how it works from Globally available data movement, and the Azure Data Factory’s Data Movement is now available in the UK blog post.
Above are the new features we introduced in March. Have more feedbacks or questions? Share your thoughts with us on Azure Data Factory forum or feedback site, we’d love to hear more from you.