This article was written by Rama Ramani from the AzureCAT team.
The “Cloud Service Fundamentals” application, referred to as “CSFundamentals,” demonstrates how to build database-backed Azure services. In the previous DAL – Sharding of RDBMS blog post, we discussed a technique known as sharding to implement horizontal scalability in the database tier. In this post, we will discuss the need for Caching, the considerations to take into account, and how to configure and implement it in Windows Azure.
The distributed cache architecture is built on scale-out, where several machines (physical or virtual) participate as part of the cluster ring with inherent partitioning capabilities to spread the workload. The cache is a
Understanding what to cache
You first need to analyze the workload and decide the suitable candidates for caching. Any time data is cached, the tolerance of “staleness” between the cache and the “source of truth” has to be within acceptable limits for the application. Overall, the cache can be used for reference (read only data across all users) such as user profile, user session (single user read-write), or in some cases for resource data (read-write across all users using lock API). And in some cases the particular dataset may not be ideally suited for caching – for example, if a particular data set is changing rapidly, or the application cannot tolerate staleness, or you need to perform transactions.
Capacity Planning
A natural next step is to estimate the caching needs of your application. This involves looking at a set of metrics, beyond just the cache size, to come up with a starting sizing guide.
- Cache Size: Amount of memory needed can be roughly estimated using the average object size and number of objects.
- Access Pattern & Throughput requirements: The read-write mix provides an indication of new objects being created, rewrite of existing objects or reads of objects.
- Policy Settings: Settings for Time-To-Live (TTL), High Availability (HA), Expiration Type, Eviction policy.
- Physical resources: Outside of memory, the Network bandwidth and CPU utilization are also key. Network bandwidth may be estimated based on specific inputs, but mostly this has to be monitored and then used as a basis in re-calculation.
A more detailed capacity planning spreadsheet is available at https://msdn.microsoft.com/en-us/library/hh914129
Azure Caching Topology
The table below lists out the set of PAAS options available on Azure and provides a quick description
Type |
Description |
In-Role dedicated |
In the dedicated topology, you define a worker role that is dedicated to Cache. This means that all of the worker role’s available memory is used for the Cache and operating overhead. https://msdn.microsoft.com/en-us/library/windowsazure/hh914140.aspx |
In-Role co-located |
In a co-located topology, you use a percentage of available memory on application roles for Cache. For example, you could assign 20% of the physical memory for Cache on each web role instance. https://msdn.microsoft.com/en-us/library/windowsazure/hh914128.aspx |
Windows Azure Cache Service |
The Windows Azure Cache Service, which currently (in Sep 2013) is in Preview. Here are a set of useful links https://msdn.microsoft.com/en-us/library/windowsazure/dn386094.aspx |
Windows Azure Shared Caching |
Multi-tenanted caching (with throttling and quotas) which will be retired no later than September 2014. More details are available at https://azure.microsoft.com/en-us/pricing/details/cache/. It is recommended that customers use one of the above options for leveraging caching. |
Implementation details
The CSFundamentals application makes use of In-Role dedicated Azure Caching to streamline reads of frequently accessed information – user profile information, user comments. The In-Role dedicated deployment was preferred, since it isolates the cache-related workload. This can then be monitored via the performance counters (CPU usage, network bandwidth, memory, etc.) and cache role instances scaled appropriately.
NOTE: The New Windows Azure Cache Service was not available during implementation of CSFundamentals. It would have been a preferred choice if there was a requirement for the cached data to be made available outside of the CSFundamentals application.
The ICacheFactory interface defines the GetCache method signature. ICacheClient interface defines the GET
public interface ICacheClient
AzureCacheClient is the implementation of this interface and has the references to the Windows Azure Caching client assemblies, which were added via the Windows Azure Caching NuGet package.
Because the DataCacheFactory object creation establishes a costly connection to the cache role instances, it is defined as static and lazily instantiated using Lazy
The app.config has auto discovery enabled and the identifier is used to correctly point to the cache worker role:
NOTE: To modify the solution to use the new Windows Azure Cache Service, replace the identifier attribute with the cache service endpoint created from the Windows Azure Portal. In addition, the API key (retrievable via the Manage Keys option on the portal) must be copied into the ‘messageSecurity authorizationInfo’ field in app.config.
The implementation of the GET
Serialization produces a byte[] array for the parameter T passed in, which is then stored in Windows Azure Cache cluster. In order to return the object requested for the specific key, the GET method uses the Deserialize method.
This blog provides an overview of Caching Basics. For more details, please refer to ICacheClient.cs, AzureCacheFactory.cs, AzureCacheClient.cs and BinarySerializer.cs in the CloudServiceFundamentals Visual Studio solution.