This is the Trace Id: a9522cab66545d58583cb9f97846f1b5
Skip to main content
Azure
gradient background

What is caching?

Learn how caching improves system performance and efficiency.

Caching meaning

Caching keeps reusable copies of frequently accessed data in memory so applications avoid repeated database calls and return results more quickly.

Key takeaways

  • Caching stores frequently requested data in fast memory so apps can respond quicker without hitting the primary database every time.
  • A typical request checks the cache first, then pulls from the source store on a miss and updates the cache using common read/write patterns.
  • Done well, caching cuts latency and backend load, helps handle traffic bursts, and supports common scenarios like websites, APIs, and session state.
  • Caching appears at multiple layers, and it works best when you choose stable read-heavy data and keep a durable source of truth with a fallback plan.

Understanding caching

What is a cache?

Caching is the practice of storing key-value data in temporary memory (such as a nonrelational structured query language (NoSQL) database) so applications can retrieve it faster than they could from conventional storage. In cloud storage architectures—where requests often cross networks and touch shared services—caching can help keep response times low and reduce repeated work.

How does caching work?

Most systems keep a durable “source of truth” (a database) for complete datasets, then keep a cache for transient subsets that are read often. When a request comes in, the app checks the cache first; if the data is there, it returns quickly without querying the back end again. Developers also cache processed data and reuse it to serve requests faster than standard queries to relational databases, SQL databases, or open-source PostgreSQL databases.

Key principles behind caching

1. Cache what you read a lot

Good candidates include data read repeatedly and data that changes infrequently (for example, product and pricing information or shared static resources that are costly to construct).

2. Cache results of repeated work

If an operation transforms data or performs a complicated calculation, caching the result can avoid recomputing it for subsequent requests.

3. Use cache layers when needed

Developers use multi-level caches (“cache layers”) to store different types of data in separate caches based on demand. Adding one or more cache layers can improve throughput and latency in a data layer.

4. Keep session state close for responsive apps

In-memory stores are often used to hold high volumes of session data such as user input, shopping cart entries, or personalization preferences for short periods. For stateful apps, teams also store session state in the cache so the app can be stateless.

Impact on system performance

Caching can improve performance in a few concrete ways:

  • Reading data from an in-memory cache is faster than accessing data from a disk-driven store.
  • Fewer queries can reduce load and limit the need to scale database infrastructure, which can also reduce costs.
  • Cache layers can improve throughput and latency, and in-memory caches can help mitigate latency during spikes in usage.
  • A cache instance can handle millions of requests per second, offering throughput that many databases can’t match.

How does caching work?

The basic flow: Cache + source data store

In many cloud designs, a cache sits next to a primary data store (such as a database). The primary store holds the complete, durable dataset on the cloud server, while the cache keeps a smaller, temporary subset that’s faster to read.

A common setup is a standalone cache layer, or a cache that lives within the app or database tier—chosen based on where you need fast reads.

Cache layers: More than one “fast path”

Some systems use multi-level caching (“cache layers”) so different kinds of data live in different caches based on demand. Adding one or more cache layers can improve throughput and latency for the data layer and reduce overall cost by cutting back-end load.

What gets cached (and why)

Teams typically cache data that falls into a few buckets:

  • Frequently read data (especially if it changes infrequently), such as product or pricing information, and shared static resources that are costly to construct.
  • Repeated computations, where an operation transforms data or performs a complicated calculation—caching the result avoids doing the same work again for later requests.
  • Session state for stateful apps, where storing session state in cache can help keep the app tier stateless.

Common caching patterns (read/write workflows)

There are several standard ways apps read from and write to a cache. Here are the most common patterns and what they mean in practice.

  • Cache-aside: Load data on demand from a data store.
  • Read-through: Read from the cache, and the cache fetches from the data store if needed.
  • Write-through: Write into the cache, and it resyncs with the data store synchronously.
  • Write-back (write-behind): Write into the cache, and it writes back to the data store in batches.
  • Write-around: Write into the data store and read from the cache; the cache updates on demand.

Why this reduces load and speeds things up

Using cache layers can improve throughput and latency by serving common requests from the cache instead of repeatedly querying the back-end store. This can lower the need to scale database infrastructure because fewer requests reach the database in the first place.

For apps with spikes in usage, in-memory caches can help mitigate latency by keeping frequently requested data close to where it’s used.

Quick guide: Picking a pattern

Use these as a starting point when you’re choosing a workflow:

  • Prefer cache-aside when your app can decide what to store and when to refresh it.
  • Prefer read-through when you want the cache to handle fetching from the data store “if needed.”
  • Consider write-through or write-back when write behavior matters and you want a defined resync approach.

Benefits and applications of caching

Faster responses with less backend work

Caching improves application performance because reading from an in-memory cache is faster than reading from a disk-driven data store. When more requests are served from cache, systems send fewer queries to backend databases, which can reduce the need to scale database infrastructure and lower related costs.

What you often get from caching
  • Lower latency for common reads, since frequently requested data comes from a faster layer.
  • Reduced database load and cost, because caching leads to fewer database queries and less pressure to overprovision database instances.
  • More predictable throughput, since a cache can handle very high request volume compared to many databases.
  • Smoother handling of traffic spikes, because in-memory caches can mitigate latency during high-throughput periods.

Better use of compute and storage resources

Caching helps cut repeated work. Data that is read repeatedly—or that’s costly to construct—can be stored once and reused. If an operation performs a complicated calculation or transforms data, caching the result reduces repeated computation for subsequent requests.

Common “good cache” candidates
  • Data that changes infrequently (for example, product and pricing information).
  • Shared static resources that are costly to construct.
  • Results of operations that are computed repeatedly.

Real-world applications (where caching shows up)

Websites: Quicker page loads and fewer round trips

Many sites cache page output (such as HTML and client scripts) so the server can return the cached output instead of rerunning page code each time. Caching also supports web multimedia scenarios through web caches and network caching, such as content delivery networks (CDNs).

Typical uses on websites
  • Cache full page output for repeat views.
  • Cache static assets through network/CDN caching.
Apps and APIs: Faster reads for shared data

Apps often store transient subsets of data in a cache for quick retrieval, while the primary database retains the complete durable dataset. Caching processed data and reusing it can serve requests faster than standard database queries.

Common app patterns
  • Cache frequently read reference data (such as pricing) to reduce repeated database calls.
  • Cache computed results to avoid repeating expensive work.
Servers and services: Scaling reads and smoothing spikes

Teams often add one or more cache layers to improve throughput and latency, serving common queries from cache and reducing database load. In-memory caches can help when usage spikes and throughput demand rises, mitigating latency during those periods.

Where this helps most
  • High-read endpoints where the same keys are requested often.
  • Systems that see periodic bursts of traffic.
Business systems: Session state and shared operational data

Caching is often used to store high volumes of short-lived session data (such as user input or personalization preferences) in an in-memory store. Some teams also store session state in cache so stateful apps can keep the app tier stateless. For operational systems, data that changes infrequently—such as product and pricing information—is a common caching target.

Examples you can map to business workloads
  • Session data for web and mobile apps (short-lived, high volume).
  • Frequently read operational reference data (pricing, shared resources).

Types of cache

Caching shows up in multiple layers between an app and its data, including client-side and server-side approaches.

Browser cache (client-side)

A browser cache stores copies of static resources on a user’s device so repeat visits can reuse those files instead of downloading them again.

Common use cases
  • Static site assets such as images, CSS files, and JavaScript files.
  • Reducing round-trips to the origin server for previously fetched resources.

Server-side cache

Server-side caching happens in the processes that run business services remotely, rather than on the end user’s device. Server-side caches are often either private (local to one app instance) or shared (used by many app instances).

Common use cases
  • Private, in-memory cache inside a single process for modest amounts of static data.
  • Shared cache service so multiple app instances read the same cached values and avoid “different versions” across instances.
  • Data that’s read frequently but modified infrequently (for example, product and pricing reference data).

CDN cache

A CDN caches content on edge servers closer to end users, so requests don’t always travel back to the origin. Unlike a browser cache (one user), a CDN cache is shared—one user’s request can populate content another user later receives.

Common use cases
  • Shared delivery of cacheable content from edge locations to reduce origin requests.
  • Static resources that benefit from being served near users (the same kinds of assets browsers cache, but that are shared across users).

CPU/memory cache

Caching isn’t only a cloud computing pattern—hardware and memory layers use it too.

CPU cache is a small, fast memory area near (or on) the processor that stores copies of frequently used data and instructions to reduce the time spent waiting on main memory.

Memory (in-memory) cache is the simplest software cache type: An in-memory store held in the address space of a single process and accessed directly by that process.

Common use cases
  • CPU cache: Repeated access to the same instructions/data without frequent trips to main memory.
  • In-memory cache: Quick reads of modest amounts of relatively static data inside one running service.

Distributed caching

A distributed cache spans multiple servers so the cache can grow and transactional capacity beyond one machine. Many shared cache services use a cluster of servers and distribute cached data across the cluster; scaling the cache can be as simple as adding more servers. Some distributed designs also layer caches so a miss at one layer pulls from an upstream provider and then stores the result locally for the next request.

Common use cases
  • A shared cache tier for multiple app instances and machines.
  • Workloads that need cache capacity and throughput that exceed a single host.
  • Layered cache setups where misses fetch from upstream and then keep a local copy for subsequent requests.

Get started with caching

Why caching still matters

Caching keeps frequently accessed data closer to where it’s used, which can improve response times and help a system handle more concurrent requests. It can also reduce contention in the original data store, such as when a database has limited connections.

In distributed applications, caching often happens in more than one place—client-side (such as a browser) and server-side (in an application or shared cache service).

A practical way to begin

Start small and focus on the data that gives you the clearest return:

  • Cache read-heavy, slow-to-fetch data that changes infrequently (for example, reference data).
  • Decide when data enters the cache:
    • On demand after the first request, so you only store what’s actually used.
    • Prepopulate (seed) some items at startup if you know they’ll be requested early—while watching for startup load on the original store.
  • Pick a pattern you can manage. The cache-aside pattern is common, and its guidance highlights how expiration, eviction, and consistency affect results.

Keep cached data fresh (and your system resilient)

A cache usually holds copies of data from a primary store, so freshness needs attention:

  • Use expiration policies to limit how long data can stay in cache before it’s refreshed.
  • Watch eviction behavior when caches fill up (many systems evict least-recently-used items by default).
  • Don’t treat the cache as the only home for critical data. Keep the source of truth in persistent storage so the system can continue if the cache is unavailable.
  • Plan a fallback path to the original data store if the cache can’t be reached and repopulate as reads occur.

Caching is a core building block for modern systems because it fits many architectures—from single services to distributed apps and edge delivery. For a managed, in-memory cache cloud provider, you can explore Azure.

gradient background
Resources

Resources

Training
Azure resources
Explore the latest developer technology and learn new skills.
Education
Student developer resources
Gain skills to jump-start your career and make a positive impact on the world.
Events
Azure events and webinars
Learn new skills, discover new technologies, and connect with your community—attend digitally or in person.
FAQ

Frequently asked questions

  • Caching keeps frequently accessed data in fast memory so apps can return results quickly without hitting the primary database every time. This lowers latency and back-end load, which helps systems handle more concurrent requests.
  • Caching improves performance by serving frequently requested data from fast memory, which lowers latency, reduces database load and cost, supports high request volume, and helps during traffic spikes.
  • One example is output caching—a web server stores a page’s rendered output (HTML and client scripts) in memory, then serves that cached output on repeat visits instead of rerunning page code.
  • A cache stores a smaller, temporary subset of data for fast access, while a database or storage holds the complete, durable dataset for long‑term retention. If cached data is lost, the permanent copy still lives in the database; caches aren’t meant to be the authoritative store for critical data.