Editor’s Note: Today’s post comes from Patrick McClory, Solutions Architect at RightScale. RightScale® Inc. cloud management enables organizations to easily deploy and manage business-critical applications across public, private, and hybrid clouds. RightScale provides efficient configuration, monitoring, automation, and governance of cloud computing infrastructure and applications.
Separation anxiety is common when it comes to making the move to cloud computing. But perhaps it shouldn’t be - the IT industry has been practicing the underlying elements needed to make a successful cloud deployment for years: redundancy, scalability, and automation. None of these ideas are necessarily new, but now they can be collectively applied to the new Windows Azure Virtual Machines (Infrastructure as a Service) to achieve a geographically distributed, highly redundant architecture in substantially less time than with a traditional datacenter approach. This is an attractive proposition to companies of all sizes, and here at RightScale we’ve been helping companies big and small deploy applications in the cloud since 2006. We pioneered the category of cloud management, providing a platform that enables organizations to deploy and manage applications in cloud environments, and our customers have launched literally millions of servers in the cloud.
RightScale is proud to be a Windows Azure strategic partner, and as a way of introducing ourselves to the Windows Azure community we want to share some of the best practices we have developed from our unique experience powering the largest cloud deployments in the world. Let’s get started.
Redundancy is king
Load balanced web application tiers and mirrored databases have been common configurations to manage failure in system deployments for some time -- the advantage in the cloud is that a geographically distributed deployment can be used not only to serve content closer to your clients, but also to provide disaster failover capabilities beyond the datacenter boundary. There are a few things to consider as you design your system redundancy:
- Don't just plan for server redundancy, server uptime is fairly useless if you can't get to them -- remember that your public DNS infrastructure matters too.
- Plan for failure and exercise your plan often. Disaster recovery and failover testing is critical to ensuring long-term uptime.
- Think about what it means to failover automatically. For some systems, automatic failover has the potential to cause data corruption -- test and evaluate your failover plan not just for server uptime but service uptime and data integrity.
Scalability opens new doors
Being able to scale up and down on demand allows you to optimize your total cost of ownership over the long term. It’s possible to go from a small number of servers to hundreds of servers in the cloud in hours or even minutes when your application requires more resources. When that demand subsides, you can just as quickly scale down so that you are not paying for idle resources. When you are designing for scale:
- Remember to distribute your deployment to plan for failure.
- Evaluate the cost of data transfer between regions or zones.
- Think about the performance metrics and system triggers that will indicate when you should scale up as well as when you should scale down.
All of those menial tasks that you manage yourself -- it's time to let go. Building out your processes in an automated manner allows your technology teams to focus less on production support items and more on the updates, products, features and solutions that will drive your company forward. If you're going to take advantage of scaling up and down rapidly and on-demand, automation is the key to ensuring that you are able to boot a server and minimize the time from boot to when the server is fully operational. For web servers, this could include everything from managing the installation of your web site's code to connecting to your load balancing setup. For database servers that are mirrored, you may be able to scale instance sizes up and down fairly easily, but automating the process of reinitializing the mirroring session and bringing up the new secondary node will make it that much easier to increase or decrease the horsepower (and cost) of your data layer. Some things to think about:
- Automate the process of provisioning and decommissioning -- both sets of tasks are detail oriented and require 100% accuracy to ensure application availably and uptime.
- Learn PowerShell and get to know the Cmdlets that are available for the products you use. Get to know MSDeploy and MSBuild. Take advantage of the tools already created for the technologies you're deploying to minimize effort.
- Use this as a time to improve your process. Check out the best practices for the products you are using and work toward implementing them in your newly automated process.
Taken individually these ideas aren't new. But by implementing them correctly, you can free your team from having to focus on physical hardware, and instead focus on best practices for highly available solutions even if your system (or your team) is on the smaller side. Deployments large and small all benefit from being redundant, scalable, and automated. The more you refine your process, the better your team will be equipped to manage growth over the short and long term. If you would like to try some of these basic techniques in Windows Azure, the RightScale free edition is a fast and easy way to get started.