Azure SQL hybrid data movement

Gepost op 23 maart, 2017

Senior Program Manager

As Cloud computing is getting more and more popular today, many companies are choosing to deploy a hybrid environment using a mix of on-premises data center and public cloud. It gives the businesses greater flexibility and more data deployment options. For example, a company can host business critical or sensitive data in on-premises data centers and deploy less-critical data or test and development environment in the public cloud. A hybrid cloud environment will also help large companies to migrate on-premises data center to cloud in multiple stages without interfering with the business.

Moving data around efficiently in a hybrid cloud environment is critical and challenging. In this blog, we are going to introduce options in different data movement scenarios built on top of on-premises SQL Server, Azure SQL VMs and Azure SQL Databases:

  • Migrate data from on-premises SQL Server to Azure
  • Replicate data for business continuity
  • Replicate data to scale out read-only workload
  • Replicate data to refresh dev-test environment
  • Distribute referencing data or multi-master
  • Backup and restore data
  • Migrate cold data from on-premises SQL Server to Azure
  • Move data into data warehouse
  • Move data into big data platform
  • Move data from other data platforms

We are going to mention the following technologies and tools in this blog:

  • Export and import .bacpac files
  • bcp
  • Transactional replication, including peer-to-peer transaction replication
  • Merge replication
  • SQL Server backup and restore, including managed backup and file snapshot backup
  • Always On availability groups
  • Data Migration Assistant (DMA)
  • Azure SQL Data Sync
  • SQL Server Integration Services (SSIS)
  • Azure SQL Database copy
  • Azure Data Factory (ADF)
  • SQL Server Migration Assistant (SSMA)
  • Attunity CDC for SSIS
  • SQL Server Stretch Database

The goal of this blog is to help you to choose the right technologies and tools to implement different scenarios. Implementation details and step by step instructions will not be covered in this blog, however we will provide links to related resources.

Migrate data from on-premises SQL Server to Azure

When you are migrating existing data from on-premises SQL Server databases to Azure, there are a few key facts you should measure and consider:

  1. Azure SQL Databases (PaaS) or Azure SQL VM (IaaS), which is the better option? This is out of scope of our topic today. Please see “Choose a cloud SQL Server option: Azure SQL (PaaS) Database or SQL Server on Azure VMs (IaaS)” for more details.
  2. How many databases are you going to migrate? How large are they?
  3. How much downtime can your service or application afford without significant business impact?

Azure SQL Databases

If you can afford some downtime, or if you are performing a test migration, you can use bacpac to migrate your databases to Azure SQL Databases. See the blog of “Migrating from SQL Server to Azure SQL Database using Bacpac Files” for detailed instructions.

When you migrate databases, especially large databases using bacpac, plan for a long enough application downtime. Depending on the database size, the downtime can be hours.

When you cannot afford to remove your databases from production during the migration, you can consider using transaction replication as the migration solution (SQL Azure Database as push subscriber). See the “Migration from SQL Server to Azure SQL Database Using Transactional Replication” and “Replication to SQL Database” for details, including the limitations of transaction replication.

See “SQL Server database migration to SQL Database in the cloud” for more about migration to Azure SQL Databases.

Azure SQL VM

If you decide to migrate and host your data in Azure SQL VM, you will have several more options, including creating a Always On replica, backup/restore, etc. See “Migrate a SQL Server database to SQL Server in an Azure VM” for more details.

You can also use DMA (Data Migration Assistant) to migrate on-premises SQL Server databases to Azure SQL VM. DMA can migrate not only data, but also other server objects like logins, users and roles. DMA can also be used to detect compatibility issues before the migration. See the document of “Data Migration Assistant (DMA)” for more details.

Replicate data for business continuity

Disruptive events can happen in any data platform and cause data loss or your databases and application to become unavailable. Capability of fast recovery from data loss or database downtime is important for business continuity, especially for business-critical databases and applications.

Azure SQL Databases

Azure SQL Databases automatically have more than one copy created to ensure the high availability (99.99% availability SLA). To prevent or mitigate business discontinuity in event of a data center outage, you can either create Active Geo-replication or restore database from geo-redundant backup.

Active geo-replication will provide minimum downtime and data loss during the data center outage. It can also be used to scale out read-only workloads (will discuss in next scenario); however, this will introduce extra cost to have active geo-replication replicas. Consider this option for business-critical databases. See “Overview: SQL Database Active Geo-Replication” for more details about active geo-replication.

You can also recover your database from geo-redundant backup only when necessary. It will introduce longer recovery time and more data loss. It provides business continuity during data center outage with lower cost. See “Recover an Azure SQL database using automated database backups” for more details:

You can find more details about SQL Azure Databases business continuity in “Overview of business continuity with Azure SQL Database”.

Azure SQL VM

For SQL VM, you can setup Always On availability group or failover cluster instance to manage the downtime during VM reboot or outage.

See “High availability and disaster recovery for SQL Server in Azure Virtual Machines” for more details.

SQL VM as DR solution for on-premises SQL Server

To create a DR site without building a data center in another region, you can extend on-premises Availability Groups to Azure by provisioning one or more Azure SQL VMs and then adding them as replicas to your on-premises Availability Group. See section “Hybrid IT: Disaster recovery solution” in “High availability and disaster recovery for SQL Server in Azure Virtual Machines” for more details.

Replicate data to scale out for read-only workload

In many systems, certain applications only need to read the information from the databases. For example, in the information publishing service, only the publisher need to update the data, and all subscribers only need to read the data. To offload the primary database, you can replicate data and redirect read-only workload to other replicas.

Azure SQL Databases

In addition of providing business continuity in event of disaster, Active Geo-replication also can be used to offload read-only workloads such as reporting jobs to the secondary databases. If you only intend to use the secondary databases for load balancing, you can create the secondary databases in the same region.

See “Overview: SQL Database Active Geo-Replication” for more details about Active Geo-replication.

SQL VMs and on-premises SQL Server

To scale out SQL VM or on-premises SQL Server, you can build readable Always On replicas. Consider to build the replica in the same region unless the read traffic is going to happen in a different region.

Replicate data to refresh dev-test environment

Before system upgrade or deployment of a new system, you may want to test it using a copy of the production data. Building a separate dev-test environment will help you to run the test without impact on your production environment.

Azure SQL Databases

To create a replication of the live production environment in Azure DB for dev-test environment, you can use database copy.

If you want to replicate a snapshot of production environment with old data within retention period (35 days for Standard and Premium; 7 days of Basic), you can restore the database to the point in time you want.

Azure SQL VM and on-premises SQL Server

To replicate data from Azure Databases to on-premises or Azure SQL VM, you can export the database into bacpac and import into SQL Server running in Azure VM or on-premises. If you only want to replicate specific tables instead of the whole database, you can run SqlPackage in Azure VM. See “Export an Azure SQL database or a SQL Server database to a BACPAC file” for more details.

Distributing referencing data/Multi-master

International ISVs and corporates usually have clients or branches in different countries or regions. To reduce the performance impact from network latency, they want some business referencing data, like SKUs, user information, etc., distributed to Azure SQL or SQL Server databases. In a typical scenario, a central database will host all reference data and distribute it to different clients or branches. The clients or branches can also update the reference data locally and push the change back to the central database.

Azure SQL Data Sync can be used to implement the data distribution between on-premises SQL Server, Azure SQL VM and Azure SQL databases, in uni-direction or bi-direction. See “Getting Started with Azure SQL Data Sync (Preview)” to learn more about Azure Data Sync.

Azure Data Sync is now only available in old Azure portal. It will be available in new Azure portal very soon. See the blog of “Azure Data Sync Update” for more details.

On-premises or Azure SQL VMs to Azure SQL Databases

When you are design and implement such a distributed multi-master system, especially cross internet, you should consider to shard the database and only sync data when necessary to reduce the latency.

If the central database is hosted on-premises or in Azure SQL VM, you can use transactional replication to distribute data. Azure SQL databases can be configured as push subscribers and replicate data from the publisher, the central database. Transaction replication can only replicate data in one-way.

Between on-premises or Azure SQL VMs

If all your data in hosted in on-premises SQL Server or Azure SQL VM, and you need to sync data in bi-direction, except using Azure SQL Data Sync, you can setup either Peer-to-peer replication or Merge replication.

If you can design your application or service to ensure certain rows will be modified only in one node, peer-to-peer replication is recommended. If the application requires sophisticated conflict detection and resolution capacities, use merge replication. Merge replication introduces more complex topology and higher maintenance cost comparing to other sync and replication technologies. Only use Merge replication when other technologies can’t solve your problem.

Peer-to-peer replication and Merge replication will not work on Azure SQL Database, as neither publisher nor subscriber.

See Peer to peer transaction replication and Merge replication for more details.

Backup and restore data

Backing up your database is essential for protecting your data. It allows you to recover data from accidental corruption or deletion. We always recommend to store backup files separately from the databases. The hybrid environment will allow you to implement this easily.

Azure SQL Databases

Azure SQL Databases automatically backup the databases at no additional charge. You can restore the database to any point in time during the retention (7 days for Basic and 35 days for Standard and Premium). All backup files are replicated to multiple copies also to a different region. See “Learn about SQL Database backups” for more details.

If your business requires longer retention for backup, you can configure long term retention backup to keep the backup up to 10 years. See “Storing Azure SQL Database Backups for up to 10 years” for more details.

If you want to restore the database to on-premises SQL Server or store the backup in your own local file system or other cloud platforms, you can export the database into a bacpac file. You can use Azure automation to schedule the backup periodically. The sample script is provided in the following document: “Export an Azure SQL database or a SQL Server database to a BACPAC file”.

Azure SQL VMs

If you are running SQL Server on Azure VM, we recommend you to backup your databases to Azure storage (Backup to URL). This feature is supported in SQL Server 2012 SP1 CU2 or later version.

You can also configure the Managed Backup to enable automated backup management. The feature is available in SQL Server 2014 or later version. In SQL Server 2016, it supports custom schedule. See “Backup and Restore for SQL Server in Azure Virtual Machines” for more details.

To ensure the availability of backup files in event of disaster or data center outage, we recommend you use GRS or RA-GRS storage to store the backup files. It will automatically replicate your backup files to a different region. See “Azure Storage replication” for more details.

If you are running SQL Server 2016 and host data files in Azure storage, another option is to use file snapshot backup. It provides near instantaneous backup and rapid restores for database files stored in Azure storage. See “File-Snapshot Backups for Database Files in Azure” for more details.

On-premises SQL Server

We always recommend users to store backup files in a different location from database files. If you are running SQL Server 2014 or later version on-premises, you can use the Backup to URL feature to back up your database or transaction log directly to Azure storage. However, due to the network latency, backing up to and restoring from Azure storage may introduce lower throughput than the same operation against local disk. See “SQL Server Backup and Restore with Windows Azure Blob Storage Service” for more details.

Migrate cold data from on-premises SQL Server to Azure

As the database size is fast growing, managing and storing historical or cold data efficiently becomes a big problem. You can use Stretch Database to migrate the cold data from on-premises SQL Server to Azure and keep the data online. See “Introduction to Stretch Database” for more details about Stretch Database.

Move data into data warehouse

Companies periodically move data from OLTP system to OLAP system/data warehouse for data analysis and reporting. The process usually includes extracting data from data source, transformation and loading data into the target data warehouse (ETL).

The data warehouse and OLAP system can be built on top of Azure SQL Databases, Azure SQL VMs or Azure SQL Data Warehouse.

Most on-premises SQL Server customers use SSIS (SQL Server Integration Services) to load data into data warehouse. It extracts data from OLTP system, transforms it, and loads it into data warehouses. SSIS can also be used in the hybrid environment. For exiting SSIS customers, it will reduce the cost to build a new ETL process. If you are loading data into SQL VM or SQL Databases, we recommend running SSIS on Azure VM. See the blog of “Running SSIS on Azure VM (IaaS) – Do more with less money” for more information.

If you are moving data into Azure Data Warehouse, you can also use ADF (Azure Data Factory) or bcp as the loading tools. See “SQL Data Warehouse Migrate Your Data” and “Use Azure Data Factory with SQL Data Warehouse” for details.

Move data into Azure Big Data Stores for Advanced Analytics

To build big data advanced analytics solution in Azure, users need to load online transactional data and other reference data from various data sources into a modernized multi-platform data warehouse, which usually consists of Azure Blob/Azure Data Lake as the staging area, perform transformation activities such as HIVE/PIG/Spark, and finally load “cooked” data into Azure Data Warehouse for BI and reporting.

If you want to move your data into Azure and build advanced analytics solution on top of it, you can use ADF (Azure Data Factory), which is a fully managed data integration service that orchestrates the movement and transformation of data.  Please see “Introduction to Azure Data Factory Service, a data integration service in the cloud” for more details about ADF.

If you are an existing SSIS user, SSIS is another option you can use to load data into big data stores such as Azure Storage Blob and Azure Data Lake Store.  You will need the Azure feature pack to load data into Azure.

Move data from other data platforms

If you want to migrate databases from other DBMSs (Oracle, MySQL, etc…) to Azure SQL Databases or Azure SQL VM, you can use SSMA (SQL Server Migration Assistant) as a helpful migration tool. See “SQL Server Migration Assistant” for more details.

If you want to continuously move data from other DBMSs or other data format like flat files to Azure SQL Databases or Azure SQL VM, or you need to do some transformation before loading data into Azure, you can use SSIS. Attunity CDC for SSIS or SQL Server CDC for Oracle by Attunity provides end to end operational data replication solution. For more information, see “Attunity CDC for SSIS” and “SQL Server 2012 CDC for Oracle – a Review of One Implementation”.

Summary

In this blog, we discussed how to choose the right technologies and tools for different hybrid data movement scenarios. It is just a starting point and general guidance for these use cases. You will still need to evaluate different solutions based on your business needs.

If you have any further question, please post it in MSDN forum.

If you have any feedback to Azure SQL Databases or Azure SQL VMs, please submit it in https://feedback.azure.com.