Caching meaning
Caching keeps reusable copies of frequently accessed data in memory so applications avoid repeated database calls and return results more quickly.
Caching keeps reusable copies of frequently accessed data in memory so applications avoid repeated database calls and return results more quickly.
Caching is the practice of storing key-value data in temporary memory (such as a nonrelational structured query language (NoSQL) database) so applications can retrieve it faster than they could from conventional storage. In cloud storage architectures—where requests often cross networks and touch shared services—caching can help keep response times low and reduce repeated work.
Most systems keep a durable “source of truth” (a database) for complete datasets, then keep a cache for transient subsets that are read often. When a request comes in, the app checks the cache first; if the data is there, it returns quickly without querying the back end again. Developers also cache processed data and reuse it to serve requests faster than standard queries to relational databases, SQL databases, or open-source PostgreSQL databases.
Good candidates include data read repeatedly and data that changes infrequently (for example, product and pricing information or shared static resources that are costly to construct).
If an operation transforms data or performs a complicated calculation, caching the result can avoid recomputing it for subsequent requests.
Developers use multi-level caches (“cache layers”) to store different types of data in separate caches based on demand. Adding one or more cache layers can improve throughput and latency in a data layer.
In-memory stores are often used to hold high volumes of session data such as user input, shopping cart entries, or personalization preferences for short periods. For stateful apps, teams also store session state in the cache so the app can be stateless.
Caching can improve performance in a few concrete ways:
In many cloud designs, a cache sits next to a primary data store (such as a database). The primary store holds the complete, durable dataset on the cloud server, while the cache keeps a smaller, temporary subset that’s faster to read.
A common setup is a standalone cache layer, or a cache that lives within the app or database tier—chosen based on where you need fast reads.
Some systems use multi-level caching (“cache layers”) so different kinds of data live in different caches based on demand. Adding one or more cache layers can improve throughput and latency for the data layer and reduce overall cost by cutting back-end load.
Teams typically cache data that falls into a few buckets:
There are several standard ways apps read from and write to a cache. Here are the most common patterns and what they mean in practice.
Using cache layers can improve throughput and latency by serving common requests from the cache instead of repeatedly querying the back-end store. This can lower the need to scale database infrastructure because fewer requests reach the database in the first place.
For apps with spikes in usage, in-memory caches can help mitigate latency by keeping frequently requested data close to where it’s used.
Use these as a starting point when you’re choosing a workflow:
Caching improves application performance because reading from an in-memory cache is faster than reading from a disk-driven data store. When more requests are served from cache, systems send fewer queries to backend databases, which can reduce the need to scale database infrastructure and lower related costs.
Caching helps cut repeated work. Data that is read repeatedly—or that’s costly to construct—can be stored once and reused. If an operation performs a complicated calculation or transforms data, caching the result reduces repeated computation for subsequent requests.
Many sites cache page output (such as HTML and client scripts) so the server can return the cached output instead of rerunning page code each time. Caching also supports web multimedia scenarios through web caches and network caching, such as content delivery networks (CDNs).
Apps often store transient subsets of data in a cache for quick retrieval, while the primary database retains the complete durable dataset. Caching processed data and reusing it can serve requests faster than standard database queries.
Teams often add one or more cache layers to improve throughput and latency, serving common queries from cache and reducing database load. In-memory caches can help when usage spikes and throughput demand rises, mitigating latency during those periods.
Caching is often used to store high volumes of short-lived session data (such as user input or personalization preferences) in an in-memory store. Some teams also store session state in cache so stateful apps can keep the app tier stateless. For operational systems, data that changes infrequently—such as product and pricing information—is a common caching target.
A browser cache stores copies of static resources on a user’s device so repeat visits can reuse those files instead of downloading them again.
Server-side caching happens in the processes that run business services remotely, rather than on the end user’s device. Server-side caches are often either private (local to one app instance) or shared (used by many app instances).
A CDN caches content on edge servers closer to end users, so requests don’t always travel back to the origin. Unlike a browser cache (one user), a CDN cache is shared—one user’s request can populate content another user later receives.
Caching isn’t only a cloud computing pattern—hardware and memory layers use it too.
CPU cache is a small, fast memory area near (or on) the processor that stores copies of frequently used data and instructions to reduce the time spent waiting on main memory.
Memory (in-memory) cache is the simplest software cache type: An in-memory store held in the address space of a single process and accessed directly by that process.
A distributed cache spans multiple servers so the cache can grow and transactional capacity beyond one machine. Many shared cache services use a cluster of servers and distribute cached data across the cluster; scaling the cache can be as simple as adding more servers. Some distributed designs also layer caches so a miss at one layer pulls from an upstream provider and then stores the result locally for the next request.
Caching keeps frequently accessed data closer to where it’s used, which can improve response times and help a system handle more concurrent requests. It can also reduce contention in the original data store, such as when a database has limited connections.
In distributed applications, caching often happens in more than one place—client-side (such as a browser) and server-side (in an application or shared cache service).
Start small and focus on the data that gives you the clearest return:
A cache usually holds copies of data from a primary store, so freshness needs attention:
Caching is a core building block for modern systems because it fits many architectures—from single services to distributed apps and edge delivery. For a managed, in-memory cache cloud provider, you can explore Azure.