Migrating Data to Microsoft Azure Files

Big Data

About Microsoft Azure Files

Microsoft Azure Files is a cloud based storage service that exposes SMB2.1 protocol based file shares in the cloud. Applications running in Azure can now easily share files between VMs using standard and familiar file system APIs like ReadFile and WriteFile. In addition, the files can also be accessed at the same time via a REST interface, which opens a variety of hybrid scenarios. Finally, Azure Files is built on the same technology as the Blob, Table, and Queue Services, which means Azure Files is able to leverage the existing availability, durability, scalability, and geo redundancy that is built into our platform. The service is currently in preview. To read more details about the service, please refer to our blog post on Files.

 

Migrating Data

As you start to use Azure Files, you may need to move large amount of existing data into the Files service. There are many options for moving data efficiently to Azure Files, and which you choose will depend on where the data is originally located. The rest of this post discusses these options, and how to achieve best performance using each option.

1.  On-premise to Azure Files over the internet

To copy files you can use the AzCopy tool provided by Microsoft Azure. AzCopy implements a number of optimizations to ensure best throughput for the copy job (e.g., parallel uploads, multiple threads, correct handling of throttling, etc.).

The format of the AzCopy command line is:

AzCopy <source path> <destination URL> [optional filespec] /S /DestKey:<YourKey>

Where:
source path is the path to the directory you want to migrate
destination URL is the http URL to the directory you want to copy to
filespec specifies any file filters for files you want to move (default is *.*)
YourKey is the storage account key for the destination storage account
/S is an optional switch to copy all directories and subdirectories under the source directory
AzCopy has many other command-line options, and you should use any others that make sense for your case. For more information, refer to this blog post on AzCopy.

Here is an example command to copy files in the c:\data directory to Azure Files:

AzCopy c:\data https://myaccount.file.core.windows.net/myshare *.* /S /DestKey:myStorageAccountKey

 

2.  Azure Disk on an IaaS VM to Azure File

Customers who run a File server role on IaaS VMs in Azure are finding the Azure File service very attractive, since it frees them from having to manage a file server themselves.

To migrate data from an IaaS disk to a Share, you need to:

  1. Attach the disk to the VM
  2. Mount the share on your VM
  3. Use robocopy to copy the data into the Azure File shares

Robocopy is a free and robust file copy utility included in Windows, for doing large file copies.

The command-line format to use is:

Robocopy <source path> <dest path> [optional filespec] /MIR /MT:16

Where:
source path is the path to the directory you want to copy
dest path is the path to the destination directory
filespec specifies any file filters for files you want to copy (default is *.*)
MT is the number of threads to use (see discussion below)
When using robocopy, you should choose the “/mt” parameter to maximize throughput. This lets you control how many parallel threads do the copy, essentially controlling the queue depth of the IO requests to storage. A very low thread count does not queue enough requests on the server to let you take advantage of the inherent parallelism of our cloud architecture. A very high thread count risks server-side throttling, which end up reducing throughput. In our testing, we have found queue depths between 16 to 32 to be best for maximizing throughput.

 

Methods to avoid:

We have found it suboptimal to use xcopy or Windows Explorer to do large file copies to Azure Files. Those tools work great for file copies to NTFS filesystems, but do not provide sufficient parallelism for copying to Azure Files. Azure Files supports highly parallel IO, so many threads doing concurrent copies results in significantly better performance. Using robocopy with the right thread count provides much higher throughput for the copy, resulting in shorter total time to transfer the data.

 

3. Azure Blob to Azure File

The fastest way to move data from Azure Blobs to Azure File is to use AzCopy. You should run AzCopy from a VM in the same datacenter as the destination storage account.

An example AzCopy command for doing this is below:

AzCopy https://myaccount1.blob.core.windows.net/mycontainer Z:\mydirectory *.* /SourceKey:myStorageAccountKey

(This assumes that the File share is mapped to drive Z)

 

In this case the data is downloaded to the VM and then copied to Azure Files.

For details on how to use AzCopy, see discussion in section 1 above. To see the command-line options for blobs and other blob-specific options like SAS, run AzCopy with “AzCopy /?”.

 

4. Cloud Drive to Azure Disk

Cloud Drive was released as a preview in 2010. It enabled customers using Azure Cloud Services to mount a page blob as a drive on Web and Worker Roles in Azure. With the release of Azure Files, all the scenarios supported by Cloud Drive can now be better served using Azure Files. Cloud Drives will be deprecated in 2015, so we recommend any customers still using Cloud Drives to migrate their data to Azure Files. The way to move the data is very similar to moving data from Azure VMs using VHDs:

  1. Mount the blob as a disk using Cloud Drive (most customers do this as part of the Web or Worker roles’ setup)
  2. Mount the share on your VM. See this blog post on how to create and mount a share.
  3. Use Robocopy to copy the data. See discussion in section 3 on using Robocopy, and the “/mt” parameter for maximum throughput.

 

We hope these options of moving data to Azure Files will help move your data to the File Service efficiently and help you use the service to optimize your existing scenarios while enabling new scenarios for your applications and business.