Customer Overview
The customer is a developer of a geosocial mobile game. The game has awareness of towns and cities across the world, with a presence in all but 2 African countries. Players primarily participate in terms of their physical geographical locationPlanning
From a planning perspective we considered multiple options. These weren’t particularly long discussions, but were ideas and are worth taking note of as they are inevitably typical of such conversations.- How big is the database?
- What is the current SKU used for the database?
- How much downtime can you tolerate?
- How much data loss can you tolerate, if any, during the migration?
- How much money are you willing to spend?
- Can you tolerate the use of recently published services and/or services that are still in preview?
- What are the current concerns related to moving the data?
- What are you not worried about?
- Leverage Active Geo-Replication functionality
- Stop all connections, export data, copy files, import into US East region
- Create DBCopy, Export data, copy files, restore in US East region
- Lowest downtime
- No data loss, during planned migration
- Rollback option is available
- A failure in execution is a simple step of re-enabling resource at US South Central data center
- Lowest TCO
- Aligned with future plans (Upgrade to Premium SKU)
Execution
The customer had to sign up for the Premium SKU preview. This must be done through the portal, and it takes seconds for the process to complete. Once signed up, the database can be upgraded to Premium. Refer to “Changing Database Service Tiers and Performance Levels” (https://msdn.microsoft.com/en-us/library/dn369872.aspx) for details of what was leveraged for planning. The below formula was actually quite accurate, in predicting about 12 hours of time spent upgrading.3 * (5 minutes + database size / 150 MB/minute)
The upgrade occurred smoothly. At one point several connections failed to connect, but this was for a brief moment and expected, as mentioned in MSDN article. The next step was to create a replica in the desired region, namely US East region. There was uncertainty about how long it would take, and uncertainty about the resources required to carry out the replica. The steps are well documented as per this link “Failover in an Active Geo-Replication Configuration” (https://msdn.microsoft.com/en-us/library/azure/dn741337.aspx). The customer said it best when responding to a question of how long did the initial synchronization of the replica take…“It [Initial Synchronization] was no more than 40 minutes*. I'm not sure on actual copy time. This is quite amazing considering making a local bacpac file of this database takes 4 hours.”*Completely online experience It should be noted that this execution plan was undertaken in-between code deployments. Much of this could’ve occurred within a 24hour period if process was purely technical. The customer chose to wait a couple of days before moving to the next action of the execution plan. The reasoning was related to making a significant code change to rather occur on the existing topology than in a new data center. They were concerned about whether the DDL changes would replicate. All changes are replicated except for those that rely on updating the master database, such as logins. These must be done in each environment, whereas creating a user can be done at the source. Refer to “Security Configuration for Active Geo-Replication” (https://msdn.microsoft.com/en-us/library/azure/dn741335.aspx) which specifically relates to the topic of logins. The last step was to switch to the new region. Below are the steps that the customer shared for the benefit of others that undertake such a task. For terminating the Active Geo-Replication topology, the instructions are well-described as per link “Terminate a Continuous Copy Relationship” (https://msdn.microsoft.com/en-us/library/azure/dn741323.aspx).
- A week in advance, upgrade database to premium and setup geo replication to the East data center. This could have been done the same day. It took less than 40 minutes for a 32GB database to fully replicate from US South Central to East.
- [Day of move] Create blob containers in East to match the names of the containers in US South Central. (we did not elect to move blob data at this time, but you would do that here)
- Deploy new Web and Worker roles to East. Point new deployment at DB and blob storage in East.
- Setup any needed SSL certs in new (East) environment
- (Downtime begins)Take old Web and new Web offline line. We do this with a configuration switch in the application that shows the users a friendly message that the site is under maintenance. Only admins can get into the site/API normally.
- Change DNS to point to new web roles IP addresses
- Deploy new code to old data center (US South Central ) with connection strings for DB and Blob pointing at East
- Configure East DB to allow connections from US South Central IP Address (may not be required)
- Set new deployment (in staging area) in US South Central to offline mode, swap with production area
- Stop DB Active Geo-Replication (Planned Termination of a Continuous Copy Relationship)
- Steps carried out using the portal
- Set US South Central DB into Read-Only mode
- Set US East region DB into read/write mode (after replication is completed) (may not be required)
- (Downtime ended) Re-enable both web sites/API in both US South Central and US East region data centers
- Setup automatic AzureDB backups on new DB.
- Using the automated export under the "configure" section on the DB in the management portal.
- After DNS replication is fully propagated around the world, delete application servers in US South Central.
“The move went very smoothly. We were down for about 1 hour, and most of that time was waiting for our new code to deploy, configuration updates and prod/staging swaps. The actual process of shutting down the Azure DB replication and changing read/write modes was simple.”