This article was written by Piyush Ranjan (MSFT) from the Azure CAT team.
With the recent general availability of Infrastructure Services (Virtual Machines and Virtual Network) on Windows Azure, many more enterprise workloads are moving to public cloud, to take advantage of cloud economics, scale and speed. I’ve recently been involved in one such enterprise workload - big data in the cloud. Here are some tips and best practices I want to share with you.
This project required that I deploy a multi-node Hadoop Cluster in Windows Azure, using a prebuilt Linux image. I used a CentOS 6.3 image from the Windows Azure Image Gallery to provision a medium sized virtual machine (VM) and proceeded to deploy a single node core Hadoop. All worked fine except that as I started testing slightly heavier workloads, I noticed that the VM would often freeze up or become unresponsive.
It is not difficult to guess that this had something to do with the resources of a medium sized VM – after all it has 2 CPU cores and 3.5 GB of memory. Yet, I was not expecting the whole VM to become unresponsive or start dropping connections. After discussing this issue with my friends and colleagues, we determined that the VM did not have Swap (i.e. what is called a page file on Windows) configured at all. Thus, its virtual memory system could not swap to disk when the memory pressure increased.
You can check how the system is doing memory wise by running the “free” command from the Linux shell prompt, and in particular, you can use “cat /proc/swaps” to see the state of the swap space – how much is configured and how much is in use. See the screenshot, below.
If the swap space is not configured at all, as is the default case with Linux VM’s provisioned in Windows Azure Virtual Machines, the “cat /proc/swaps” will return nothing, and likewise the “free” command will not show any activity in swap.
An interesting question is why doesn’t the VM provisioning using a Linux library image (i.e. from the Windows Azure Image Gallery) automatically configure swap space. The thinking, here, is that the user should decide on the size and location of the swap and do it post provisioning. However, it is quite possible that one continues to use the VM without the swap ever getting configured till processes begin to crash or the VM freezes up.
That said, once we realized all we needed to do was to get the swap space, we followed a simple set of steps to configure a file based swap on the resource disk; a medium sized virtual machine in Windows Azure comes with 135 GB of resource disk mounted as “/mnt/resource”. Given below is a walkthrough of the steps for configuring a file based swap space on the VM.
- Use the “fallocate” command to allocate a swap file of suitable size, say, 5GB on the resource disk. The syntax is: “fallocate -l 5g /mnt/resource/swap5g” where “swap5g” is the name of the file
- Change the permissions on the file using “chmod” command so that only the root user has read/write permissions on the swap file. The syntax is: “chmod 600 /mnt/resource/swap5g”
- Use the “mkswap” command to set up the file as swap area. The syntax is: “mkswap /mnt/resource/swap5g”
- Enable the use of the swap file using “swapon” command. The syntax is: “swapon /mnt/resource/swap5g”
- The swap is ready for use now, and the “cat /proc/swaps” command should confirm it now. Add an entry to the “/etc/fstab” file so that even if the VM recycles in Azure, the swap settings are retained. The syntax is: echo “/mnt/resource/swap5g none swap sw 0 0” >> /etc/fstab
Here is a transcript of the above commands executed in my VM.
Acknowledgement: Thanks to my colleague Amit Srivastava for help with troubleshooting and resolving the swap issue.