Deciphering AEM Cache Puzzle: Browser, CDN (Revalidation), and Dispatcher

Imagine users waiting for a slow website. Not ideal, right? Caching solves this by storing frequently used content closer to visitors. This reduces strain on your server and cuts down on load times, leading to a responsive website.

In AEM, caching happens at multiple levels: browsers, CDNs (content delivery networks), Dispatcher, and even the AEM publish servers themselves. But here’s the catch: caching isn’t a magic bullet. Striking a balance between keeping content fresh and delivering it fast is key.

Understanding the Players:

Browser Caching: At the forefront of content delivery is the user’s web browser. Browser caching involves storing locally accessed resources, such as images, scripts, and stylesheets, on the user’s device. When a user revisits a website, the browser checks its cache for these resources before making additional requests to the server. By serving content from the browser cache whenever possible, website performance is greatly enhanced, as it reduces the need for repeated downloads and server round trips.
Content Delivery Network (CDN): A CDN is a globally distributed network of servers designed to deliver web content efficiently. By caching static and dynamic content closer to end-users, it reduces latency and minimizes the distance data needs to travel, resulting in faster load times and improved performance. In the context of AEM, a CDN can cache both the HTML output and static resources, such as images and CSS files, ensuring that content is served from the nearest edge location to the user.
Dispatcher: It sits between the AEM publish instance and the CDN, optimizing content for faster delivery. It handles AEM specifics like security and initial caching (rendered HTML pages and associated resource), while the CDN focuses on geographically optimized caching for blazing-fast global delivery
AEM : At the heart of the AEM content delivery chain lies the publish instance. This is where the actual rendering and assembly of web pages and components occur. This is the official source of truth for all your AEM goodness.

These players work together to ensure efficient content delivery and optimal performance. The browser cache serves as the first line of caching, followed by the CDN, which caches content closer to the end-user. The Dispatcher acts as an intermediary, caching the rendered HTML pages and static resources, while the AEM publish instance handles the actual rendering and assembly of content.

At the heart of caching control are the Cache-Control and related headers, which act as instructions for caching mechanisms and intermediaries. These headers dictate how long a resource should be considered fresh, when it should be revalidated or fetched from the origin server, and under what circumstances it can be served from the cache.

Important Cache Control Headers:

Here’s a table that explains the important cache control header and its applicability to CDNs, dispatchers, and browsers:

Header	Description	CDN	Dispatcher	Browser
Cache-Control: max-age	Specifies the maximum time (in seconds) that a cached resource should be considered fresh.	✔️	✔️	✔️
Cache-Control: s-maxage	Similar to max-age, but specifically for shared caches like CDNs. Overrides max-age for shared caches.	✔️	❌	❌
Cache-Control: no-cache	Instructs caches not to serve the resource from the cache without first revalidating with the origin server.	✔️	✔️	✔️
Cache-Control: no-store	Instructs caches not to store the resource at all.	✔️	✔️	✔️
Expires	Specifies the date and time after which the cached resource should be considered stale.	✔️	✔️	✔️

Quick Notes:

Cache-Control directives take priority over Expires.
If both max-age and s-maxage are present, s‑maxage is used by CDN
For “AEM as a Cloud Service“, use the Surrogate-Control header to control CDN caching independent from browser caching

<LocationMatch "^/content/.*\.(html)$">
     Header set Cache-Control "max-age=200"
     Header set Surrogate-Control "max-age=3600"
     Header set Age 0
</LocationMatch>

For details on debugging Headers, please refer to: Exploring AEM Request and Response Headers: Analysis of Browser, CDN and Dispatcher

Revalidation with stale-while-revalidate and stale-while-error

In addition to the traditional cache control headers, modern browsers and HTTP caches support two additional directives that can greatly enhance the user experience: stale-while-revalidate and stale-while-error.

stale-while-revalidate

The stale-while-revalidate directive allows the cache to serve a stale (expired) response while simultaneously revalidating the resource in the background. This means that instead of showing a loading spinner or blank page while fetching a fresh copy of the resource, the browser can immediately display the stale content, providing an improved perceived performance. Once the revalidation is complete, the cache is updated with the fresh resource for subsequent requests.

Example: Cache-Control: max-age=600, stale-while-revalidate=30

In this example, the resource is considered fresh for 10 minutes (600 seconds). After that, the cache can serve the stale resource for up to an additional 30 seconds while revalidating it in the background.

stale-while-error

The stale-while-error directive allows the cache to serve a stale (expired) response when the origin server is unavailable or encounters an error. This helps maintain a level of resilience and availability, even in the face of intermittent network or server issues, by serving the stale content rather than displaying an error to the user.

Example: Cache-Control: max-age=600, stale-while-error=3600

In this example, the resource is considered fresh for 10 minutes (600 seconds). If the origin server is unavailable or encounters an error after that, the cache can serve the stale resource for up to an additional hour (3600 seconds).

Importance of stale-while-revalidate and stale-while-error

These directives are particularly important for improving perceived performance and maintaining availability in modern web applications:

Improved Perceived Performance: By serving stale content immediately while revalidating in the background, users experience faster load times and a smoother browsing experience, as they don’t have to wait for the entire resource to be fetched from the origin server.
Resilience and Availability: The stale-while-error directive ensures that users can still access cached content, even when the origin server is unavailable or encountering errors. This can be particularly useful in scenarios such as intermittent network issues or during maintenance windows.
Bandwidth Savings: By serving stale content from the cached copies and sending only a single revalidation request to the origin server, CDNs can significantly reduce bandwidth consumption, especially in scenarios with high concurrent traffic for stale resources. This efficient handling of stale requests minimizes redundant data transfers, resulting in substantial bandwidth savings.

Before we dive into the specifics, let’s get a quick snapshot of how caching and revalidation work together. Don’t worry, we’ll break it down step-by-step in the next sections for different scenarios!

Lets understand it with example of a page that has Cache Control header:

Cache-Control: max-age=600, s-max-age=1200,stale-while-revalidate=60, stale-while-error=3600

max-age=600: This tells caches (browser, Dispatcher) to consider the content fresh for 10 minutes (600 seconds).
s-maxage=1200: This specifically tells the CDN to keep its copy fresh for 20 minutes (1200 seconds).
stale-while-revalidate=60: If the content is slightly expired (within 1 minute after 20 minutes), the CDN cache can still serve it while fetching a fresh version in the background.
stale-while-error=3600: Even if there’s an error fetching fresh content, the cache can display the slightly outdated version for up to 1 hour (3600 seconds) to avoid a blank page.

Scenario-1: A new request comes from the browser (No cache available on Browser, CDN, and Dispatcher)

When a user requests a resource for the first time, and the resource is not cached anywhere in the delivery chain (Browser, CDN, or Dispatcher), the following flow occurs:

Browser Cache Check: The browser checks its local cache for the requested resource. Since it’s a new request, there won’t be a cached copy available.
CDN Cache Check: The request is forwarded to the nearest CDN edge location, where the CDN checks its cache for the resource. Again, since it’s a new request, the resource won’t be cached in the CDN.
Dispatcher Cache Check: The CDN forwards the request to the Dispatcher. The Dispatcher checks its cache for the requested resource, but it’s not present.
AEM Publish Instance Request: Since the resource is not cached anywhere, the Dispatcher sends the request to the AEM Publish instance.
Resource Generation and Rendering: The AEM Publish instance generates and renders the requested resource (HTML page, image, CSS, etc.).
Response Caching: As the rendered resource travels back through the delivery chain, it is cached at each level.

In this scenario, the resource had to be fetched from the AEM Publish instance and propagated through the delivery chain, caching it at each level for subsequent requests.

Scenario-2: Cached request is available in the browser

This scenario would occur when the cached copy of the resource is available in browser (initial 600 seconds or 10 min)

The browser checks its cache to determine if it has a stored copy of the requested resource.
If the requested resource is found in the browser cache and it has not expired, the browser retrieves the resource directly from its cache.
The browser then displays the cached resource to the user, providing a faster response time and reducing the need to fetch the resource from the server.
Since the resource is served from the browser cache, it minimizes network latency and server load, contributing to an enhanced user experience.

In this scenario, the request was fulfilled by the CDN without involving the Dispatcher or AEM Publish instance, as the CDN had a fresh cached copy of the resou

Scenario-3: Cached request has expired in the browser, but valid in the CDN.

This scenario would occur when the browser’s cached copy of the resource has expired (after 600 seconds, or 10 minutes, in this example). As s-max-age=1200, the CDN copy is considered as fresh and valid.

Browser Cache Check: The browser checks its local cache for the requested resource and finds an expired copy.
The browser sends a request to the CDN.
CDN Cache Check: The CDN checks its cache for the requested resource and finds a fresh cached copy that matches the provided cache validator
CDN Response: The CDN serves the cached copy of the resource to the browser.
Browser Cache Update: The browser updates its local cache with the fresh copy of the resource received from the CDN.

In this scenario, the request was fulfilled by the CDN without involving the Dispatcher or AEM Publish instance, as the CDN had a fresh cached copy of the resource.

Scenario-4: Expired request cached in Browser & CDN with “stale-while-revalidate” in effect

This scenario would occur when both the browser and CDN have expired cached copies of the resource(> 1200 seconds), but the stale-while-revalidate directive is in effect (60 seconds in this example).

Browser Cache Check: The browser finds an expired copy of the requested resource in its local cache.
The browser sends a request to the CDN.
CDN Cache Check: The CDN finds an expired (stale) copy of the requested resource in its cache.
CDN Stale Response and Revalidation: Since `stale-while-revalidate` is in effect, the CDN serves the stale cached copy to the browser immediately. Simultaneously, the CDN sends a revalidation request to Dispatcher to fetch the fresh version of the resource.
Browser Renders Stale Content: The browser renders the stale content received from the CDN.
Origin Revalidation and Cache Update: Dispatcher responds to the revalidation request with the fresh version of the resource. The CDN caches the fresh copy.
Subsequent Requests Served Fresh Content: Any subsequent requests for the same resource will be served the fresh cached copy from the CDN.

In this scenario, the `stale-while-revalidate` directive allows serving stale content immediately while revalidating in the background, providing an improved perceived performance for users.

Scenario-5: Expired request in Browser & CDN with elapsed “stale-while-revalidate”

This scenario would occur if the stale-while-revalidate period has elapsed (more than 60 seconds in this example), and both the browser and CDN have stale (expired) cached copies of the resource.

Browser Cache Check: The browser finds an expired (stale) copy of the requested resource in its local cache.
The browser sends a request to the CDN.
CDN Cache Check: The CDN finds an expired (stale) copy of the requested resource in its cache.
CDN Revalidation Request: Since the `stale-while-revalidate` period has elapsed, the CDN cannot serve the stale copy. Instead, it sends a revalidation request to the origin server (Dispatcher/AEM Publish) to fetch the fresh version of the resource.
Origin Response and Cache Update: The origin server (Dispatcher/AEM Publish) responds with the fresh version of the resource. The CDN caches the fresh copy and propagates it to its edge locations.
CDN Response to Browser: The CDN serves the fresh cached copy of the resource to the browser.
Browser Cache Update: The browser updates its local cache with the fresh copy of the resource received from the CDN.

In this scenario, since the `stale-while-revalidate` period had elapsed, the CDN could not serve the stale content and had to revalidate with the origin server to fetch the fresh version of the resource.

Miscellaneous Questions

Lots of Visitors, Outdated Content: How Does the CDN Handle It?

When a CDN receives multiple requests for a stale (expired) resource, it typically employs caching mechanisms and techniques to efficiently handle the requests and minimize the load on the origin server. Here’s how a CDN might deal with 100 requests for a stale resource:

Initial Stale Request:
- The first request for the stale resource is received by the CDN edge server.
- If the stale-while-revalidate directive is set, the CDN will serve the stale content from its cache to the first client immediately.
- Simultaneously, the CDN will initiate a revalidation request to the origin server to fetch the fresh version of the resource.
Subsequent Stale Requests:
- As the remaining 99 requests arrive at the CDN edge server, they will be served the stale content from the cache, until fresh content arrives.
Cache Collapsing/Request Coalescing:
- When multiple requests for the same stale resource arrive within a short timeframe, the CDN recognizes that a revalidation request is already in progress.
- Instead of sending additional revalidation requests to the origin server, the CDN “collapses” or “coalesces” these requests into a single revalidation request.
- This means that only one revalidation request is sent to the origin server, minimizing the load and reducing redundant requests.
Caching and Updating:
- Once the origin server responds with the fresh version of the resource, the CDN caches the updated content.
- Any subsequent requests for the same resource will be served the fresh content from the cache.
- Additionally, the CDN will propagate the updated content to its other edge locations, ensuring that future requests from different geographic regions can be served the fresh content from the nearest edge server.

By employing techniques like stale-while-revalidate, cache collapsing, and efficient cache updates, CDNs can effectively manage bursts of requests for stale content. They can serve stale content immediately to provide a better user experience while minimizing the load on the origin server by sending only a single revalidation request. This approach ensures scalability, reduces bandwidth consumption, and optimizes resource utilization, even in scenarios with high concurrent traffic for stale resources.

What are the differences between the Expires and Cache-Control headers, and when should each be used in HTTP responses?

Expires Header:

The Expires header specifies an absolute expiration time for cached content.
It takes a date value, indicating the date and time at which the content expires and should no longer be considered valid.
Example: Expires: Sat, 01 Jan 2025 00:00:00 GMT

Cache-Control Header:

The Cache-Control header provides more fine-grained control over caching behavior compared to Expires.
It supports various directives to specify caching policies, such as max-age, no-cache, no-store, public, private, must-revalidate, and more.
Example: Cache-Control: max-age=3600, must-revalidate, public

When to Use Which:

Use the Expires header when you want to specify an absolute expiration time for cached content and are not concerned about more nuanced caching policies.
Use the Cache-Control header for more granular control over caching behavior, including specifying caching duration (max-age), cache validation (must-revalidate), and other caching directives tailored to your specific requirements. It offers greater flexibility and is the recommended approach for modern web applications.

Why do we need Dispatcher when a Content Delivery Network (CDN) is already in place?

CDNs serve content from various global locations to minimize latency, but their cache invalidation process can be cumbersome, often leading to outdated content delivery. To address this, short TTLs are used, causing frequent cache misses and unnecessary content re-fetching. Dispatcher caching complements CDNs by delivering unchanged content swiftly and efficiently. It front-ends AEM instances, delivering unchanged files quickly and reducing server load. This combination optimizes content delivery from the edge while maintaining reasonable latency for content updates, offering an effective solution without the need for numerous AEM instances. In conclusion, leveraging both CDN and Dispatcher cache ensures efficient content delivery and minimizes latency when properly configured. For more details, please refer to https://cqdump.joerghoh.de/2024/02/20/cdn-and-dispatcher-2-complementary-caching-layers/
Moreover, Dispatcher caching serves as a protective barrier, shielding AEM systems from potential security threats such as Distributed Denial of Service (DDoS) attacks, SQL injection, and cross-site scripting (XSS) attacks. By intercepting incoming requests and enforcing security policies and access controls, Dispatcher caching helps prevent malicious traffic from reaching the underlying AEM infrastructure and ensures the integrity and confidentiality of sensitive data.

3 thoughts on “Deciphering AEM Cache Puzzle: Browser, CDN (Revalidation), and Dispatcher”

Anonymous says:

April 23, 2024 at 8:49 am

Wonderful article, thanks for your contributions to the AEM online community.

LikeLiked by 1 person

1. Aanchal Sikka says:
  
  April 23, 2024 at 11:00 am
  
  Thanks! I’m really glad the content is proving useful.
  
  LikeLike
  
Anonymous says:

April 23, 2024 at 8:10 pm

Great explanation with scenarios. Thanks for the article.

LikeLiked by 1 person