Cloud Service Fundamentals – Caching Basics

This article was written by Rama Ramani from the AzureCAT team.

The “Cloud Service Fundamentals” application, referred to as “CSFundamentals,” demonstrates how to build database-backed Azure services. In the previous DAL – Sharding of RDBMS blog post, we discussed a technique known as sharding to implement horizontal scalability in the database tier. In this post, we will discuss the need for Caching, the considerations to take into account, and how to configure and implement it in Windows Azure.

The distributed cache architecture is built on scale-out, where several machines (physical or virtual) participate as part of the cluster ring with inherent partitioning capabilities to spread the workload. The cache is a <key, value> lookup paradigm and the value is a serialized object, which could be the result set of a far more complex data store operation, such as a JOIN across several tables in your database. So, instead of performing the operation several times against the data store, a quick key lookup is done against the cache.

Understanding what to cache

You first need to analyze the workload and decide the suitable candidates for caching. Any time data is cached, the tolerance of “staleness” between the cache and the “source of truth” has to be within acceptable limits for the application. Overall, the cache can be used for reference (read only data across all users) such as user profile, user session (single user read-write), or in some cases for resource data (read-write across all users using lock API). And in some cases the particular dataset may not be ideally suited for caching – for example, if a particular data set is changing rapidly, or the application cannot tolerate staleness, or you need to perform transactions.

Capacity Planning

A natural next step is to estimate the caching needs of your application.  This involves looking at a set of metrics, beyond just the cache size, to come up with a starting sizing guide.

  • Cache Size: Amount of memory needed can be roughly estimated using the average object size and number of objects.
  • Access Pattern & Throughput requirements: The read-write mix provides an indication of new objects being created, rewrite of existing objects or reads of objects.
  • Policy Settings: Settings for Time-To-Live (TTL), High Availability (HA), Expiration Type, Eviction policy.
  • Physical resources: Outside of memory, the Network bandwidth and CPU utilization are also key. Network bandwidth may be estimated based on specific inputs, but mostly this has to be monitored and then used as a basis in re-calculation.

A more detailed capacity planning spreadsheet is available at http://msdn.microsoft.com/en-us/library/hh914129

Azure Caching Topology

The table below lists out the set of PAAS options available on Azure and provides a quick description 

Type

Description

In-Role dedicated

In the dedicated topology, you define a worker role that is dedicated to Cache. This means that all of the worker role’s available memory is used for the Cache and operating overhead.

http://msdn.microsoft.com/en-us/library/windowsazure/hh914140.aspx

In-Role co-located

In a co-located topology, you use a percentage of available memory on application roles for Cache. For example, you could assign 20% of the physical memory for Cache on each web role instance.

http://msdn.microsoft.com/en-us/library/windowsazure/hh914128.aspx

Windows Azure Cache Service

The Windows Azure Cache Service, which currently (in Sep 2013) is in Preview. Here are a set of useful links

http://blogs.msdn.com/b/windowsazure/archive/2013/09/03/announcing-new-windows-azure-cache-preview.aspx

http://msdn.microsoft.com/en-us/library/windowsazure/dn386094.aspx

Windows Azure Shared Caching

Multi-tenanted caching (with throttling and quotas) which will be retired no later than September 2014. More details are available at http://www.windowsazure.com/en-us/pricing/details/cache/. It is recommended that customers use one of the above options for leveraging caching.

Implementation details

The CSFundamentals application makes use of In-Role dedicated Azure Caching to streamline reads of frequently accessed information – user profile information, user comments. The In-Role dedicated deployment was preferred, since it isolates the cache-related workload. This can then be monitored via the performance counters (CPU usage, network bandwidth, memory, etc.) and cache role instances scaled appropriately.

NOTE: The New Windows Azure Cache Service was not available during implementation of CSFundamentals. It would have been a preferred choice if there was a requirement for the cached data to be made available outside of the CSFundamentals application.

The ICacheFactory interface defines the GetCache method signature. ICacheClient interface defines the GET<T> and PUT<T> methods signature. 

public interface ICacheClient

 

AzureCacheClient is the implementation of this interface and has the references to the Windows Azure Caching client assemblies, which were added via the Windows Azure Caching NuGet package.

 

Because the DataCacheFactory object creation establishes a costly connection to the cache role instances, it is defined as static and lazily instantiated using Lazy<T>.

The app.config has auto discovery enabled and the identifier is used to correctly point to the cache worker role:

      <autoDiscover isEnabled=”true” identifier=”CSFundamentalsCaching.WorkerRole” />

NOTE: To modify the solution to use the new Windows Azure Cache Service, replace the identifier attribute with the cache service endpoint created from the Windows Azure Portal. In addition, the API key (retrievable via the Manage Keys option on the portal) must be copied into the ‘messageSecurity authorizationInfo’ field in app.config.

The implementation of the GET<T> and PUT<T> methods uses the BinarySerializer class, which in turn leverages the Protobuf class for serialization and deserialization. protobuf-net is a .NET implementation of protocol buffers, allowing you to serialize your .NET objects efficiently and easily. This was added via the protobuf-net NuGet package.

Serialization produces a byte[] array for the parameter T passed in, which is then stored in Windows Azure Cache cluster. In order to return the object requested for the specific key, the GET method uses the Deserialize method.

This blog provides an overview of Caching Basics. For more details, please refer to ICacheClient.cs, AzureCacheFactory.cs, AzureCacheClient.cs and BinarySerializer.cs in the CloudServiceFundamentals Visual Studio solution.