Reducing TTFB with Edge Caching
The single fastest way to move Time to First Byte from a slow-tail number into the Good band is to stop computing HTML on every request and serve it from a cache that lives a few milliseconds from the user. As established in the parent FCP & TTFB Analysis work, TTFB is the first link in the critical-path chain, and because no pixel can paint before the first byte arrives, every millisecond shaved off TTFB is a millisecond returned to Largest Contentful Paint. This guide walks through caching HTML at the edge with Cache-Control s-maxage, stale-while-revalidate, sensible cache keys that strip tracking parameters, edge SSR/ISR, a Cloudflare Worker built on the Cache API, purge-on-deploy, and how to prove the win by measuring TTFB p75 before and after with the Navigation Timing API and your self-hosted beacon collection.
Prerequisites
Before you cache HTML at the edge, confirm these are in place:
- A CDN or edge platform that can cache HTML responses and run code on the request path — Cloudflare Workers, Fastly Compute, or a Vercel/Netlify edge runtime.
- A field-data baseline for TTFB at p75 over a 28-day window, from either your own Real-User Monitoring beacon collection or the CrUX API, so the before/after is measured, not assumed.
- A clear separation of public vs personalized routes. Only responses that are identical for every user in a given segment are safe to cache as shared HTML. Anything keyed to a logged-in session stays out of the shared cache.
- A deterministic deploy hook you can call from CI to purge the cache when content or the build changes.
Server-Timingheaders emitted at the origin and preserved through the edge, so a cached response still tells youcache;desc=HITand the miss path still reports origin compute.
What caching headers actually do
The shared-cache directives are distinct from the browser-cache max-age and easy to confuse. The table is the contract between your origin and every cache on the path.
| Directive | Applies to | Effect on TTFB | When to use |
|---|---|---|---|
max-age=N |
Browser (private) | Repeat views only | Static assets, never shared HTML |
s-maxage=N |
Shared caches / CDN | First byte from edge for N seconds | Public HTML that changes on a schedule |
stale-while-revalidate=N |
Shared caches | Serves stale instantly, refreshes in background | Smoothing the moment freshness expires |
stale-if-error=N |
Shared caches | Serves stale on origin 5xx | Resilience when the origin is down |
private |
Browser only | No edge caching | Personalized or authenticated HTML |
no-store |
Nothing caches | Always full origin TTFB | Truly per-request responses |
The combination that wins for public HTML is s-maxage for the fresh window plus stale-while-revalidate for the grace window. With s-maxage=60, stale-while-revalidate=600, the edge serves a sub-50 ms cached response for 60 seconds, then for the next 600 seconds keeps serving the stale copy at edge speed while it revalidates against the origin in the background — so a real user almost never waits on origin compute.
How to reduce TTFB with edge caching
Step 1 — Send shared-cache headers for public HTML
Set Cache-Control so shared caches store the document while the browser does not. Keep max-age low (or zero) for HTML so a user’s own back-navigation revalidates, but give shared caches a real s-maxage.
// Origin handler (Express-style). Only public, non-personalized routes.
function setHtmlCacheHeaders(req, res) {
const isPersonalized = Boolean(req.cookies?.session_id);
if (isPersonalized) {
res.setHeader('Cache-Control', 'private, no-store');
return;
}
res.setHeader(
'Cache-Control',
'public, max-age=0, s-maxage=60, stale-while-revalidate=600, stale-if-error=86400'
);
// Let caches vary correctly without fragmenting on noise.
res.setHeader('Vary', 'Accept-Encoding');
}
Why: max-age=0 forces the browser to revalidate, so a user never sees a stale page after a deploy, while s-maxage=60 lets the CDN absorb the traffic. stale-while-revalidate removes the latency cliff at expiry, and stale-if-error keeps the site up at edge speed during an origin outage. A narrow Vary avoids splitting the cache into many barely-used variants.
Step 2 — Normalize the cache key and strip tracking params
A cache key that includes utm_*, fbclid, or gclid fragments your cache into thousands of identical-content variants, collapsing the hit rate and your TTFB win. Normalize the URL before it becomes the key.
const STRIP_PREFIXES = ['utm_', 'mc_'];
const STRIP_EXACT = new Set([
'fbclid', 'gclid', 'gbraid', 'wbraid', 'msclkid', 'igshid', 'ref', 'mkt_tok',
]);
function cacheKeyUrl(rawUrl) {
const url = new URL(rawUrl);
for (const key of [...url.searchParams.keys()]) {
const lower = key.toLowerCase();
if (STRIP_EXACT.has(lower) || STRIP_PREFIXES.some((p) => lower.startsWith(p))) {
url.searchParams.delete(key);
}
}
url.searchParams.sort(); // order-independent key
url.hash = ''; // fragments never reach the server anyway
return url.toString();
}
Why: Marketing links append per-click parameters that do not change the HTML. Stripping them and sorting the remainder means ?a=1&b=2 and ?b=2&a=1&utm_source=x resolve to one cache entry. Hit rate is the lever that controls how often a real user pays origin TTFB versus edge TTFB.
Step 3 — Cache HTML in a Cloudflare Worker with the Cache API
The Worker sits in front of the origin, builds the normalized key, and reads or writes the edge cache directly. This gives you full control over the key and the hit/miss semantics.
export default {
async fetch(request, env, ctx) {
if (request.method !== 'GET') return fetch(request);
const cache = caches.default;
const keyUrl = cacheKeyUrl(request.url);
const cacheKey = new Request(keyUrl, { method: 'GET' });
let response = await cache.match(cacheKey);
if (response) {
response = new Response(response.body, response);
response.headers.set('X-Edge-Cache', 'HIT');
return response;
}
// Miss: fetch origin with the ORIGINAL url (origin still sees real params).
const originResponse = await fetch(request);
response = new Response(originResponse.body, originResponse);
response.headers.set('X-Edge-Cache', 'MISS');
const cc = response.headers.get('Cache-Control') || '';
const cacheable =
response.status === 200 &&
/s-maxage=\d/.test(cc) &&
!/private|no-store/.test(cc);
if (cacheable) {
// Store a clone keyed by the normalized url; do not block the response.
ctx.waitUntil(cache.put(cacheKey, response.clone()));
}
return response;
},
};
Why: caches.default is the colocated edge cache, so a hit returns without a network hop to the origin — that is the TTFB collapse you are after. Storing under the normalized cacheKey while fetching the origin with the original request preserves any analytics the origin needs, and ctx.waitUntil writes the cache after the response has already been sent so the write never adds latency.
Step 4 — Move rendering to the edge with ISR
Caching only helps once the first request has paid for the render. With edge SSR plus Incremental Static Regeneration (ISR), the origin renders once, the edge holds the result, and subsequent requests are pure edge reads. In Next.js this is a per-route revalidation window that maps onto the same stale-while-revalidate behavior.
// app/blog/[slug]/page.js — Next.js App Router
export const revalidate = 60; // regenerate at most once per 60s
export const dynamicParams = true;
export default async function Page({ params }) {
const post = await fetch(`https://cms.example.com/posts/${params.slug}`, {
next: { revalidate: 60, tags: [`post:${params.slug}`] },
}).then((r) => r.json());
return <article dangerouslySetInnerHTML={{ __html: post.html }} />;
}
Why: revalidate = 60 means at most one user per minute triggers a background regeneration; everyone else gets the cached render at edge TTFB. The tags let you purge exactly the affected route on a CMS change instead of flushing everything, which keeps the hit rate high right after an edit.
Step 5 — Purge the cache on deploy
A long s-maxage is only safe if you can invalidate instantly when a deploy ships new HTML. Wire a purge into the deploy pipeline so users never see stale content past a release.
#!/usr/bin/env bash
# purge-on-deploy.sh — run as the last CI step after a successful deploy
set -euo pipefail
: "${CF_ZONE_ID:?missing}"
: "${CF_API_TOKEN:?missing}"
curl -fsS -X POST \
"https://api.cloudflare.com/client/v4/zones/${CF_ZONE_ID}/purge_cache" \
-H "Authorization: Bearer ${CF_API_TOKEN}" \
-H "Content-Type: application/json" \
--data '{"purge_everything":true}' \
| grep -q '"success":true' && echo "edge cache purged"
Why: Purging at the end of the deploy lets you run aggressive s-maxage without the usual fear of serving an old build. For surgical invalidation, swap purge_everything for {"tags":["post:my-slug"]} (or "files":[...]) so only the changed routes drop out of cache and the rest keep serving from the edge.
Verifying it works
- DevTools: Open the Network panel, select the document request, and read the Timing tab — “Waiting for server response” is TTFB. On a warm edge it should drop into the tens of milliseconds. Confirm the
X-Edge-Cache: HITandcf-cache-status: HITresponse headers. - Shell: Probe the same URL twice and compare. A cold request shows a MISS and higher
time_starttransfer; the second shows a HIT and a fraction of the time.
for i in 1 2; do
curl -s -o /dev/null \
-w "try ${i}: ttfb=%{time_starttransfer}s status=%{http_code} cache=%header{x-edge-cache}\n" \
"https://www.example.com/blog/edge-caching"
done
- Navigation Timing in the field: Re-aggregate TTFB at p75 from your RUM beacons, segmented by
X-Edge-Cache, so a hit population and a miss population are never averaged together.
const [nav] = performance.getEntriesByType('navigation');
const ttfb = Math.round(nav.responseStart - nav.requestStart);
const st = nav.serverTiming.reduce((a, e) => (a[e.name] = e.desc || e.duration, a), {});
navigator.sendBeacon('/rum-collector', JSON.stringify({
name: 'TTFB', value: ttfb, cache: st.cache || 'unknown', path: location.pathname,
}));
- RUM dashboard: A real win is a sustained downward step in p75 TTFB over the 28-day window, plus a rising cache-hit ratio. If you wired measurement through the web-vitals API implementation, confirm p75 LCP moves down in step — that is the metric that actually changes the page-experience signal.
Edge cases & gotchas
- Caching personalized HTML leaks data. Any response that varies by login state must be
private, no-store. A single mis-cached authenticated page can serve one user’s content to thousands. Gate on the session cookie before you ever write to the shared cache. Set-Cookieon a cacheable response is a trap. Most CDNs refuse to cache a response carryingSet-Cookie; if yours does cache it, you serve the same cookie to everyone. StripSet-Cookiefrom public HTML at the origin or in the Worker before storing.Vary: CookieorVary: User-Agentdestroys the hit rate by fragmenting the cache per unique value. KeepVarytoAccept-Encoding(andAccept-Languageonly if you truly serve localized HTML).- Bot and crawler traffic can warm — or pollute — the cache. A crawler hitting every
utm_variant before you strip params multiplies your entries; Step 2’s normalization is what keeps that bounded. - Cache TTFB looks bimodal in the field. A 5 ms hit population and a 600 ms miss population blur into a meaningless p75 if combined. Always segment by cache status when reporting, exactly as the diagnosis in TTFB vs FCP: What Really Matters for SEO recommends.
- Background revalidation can serve genuinely stale content past a deploy if you forget the purge.
stale-while-revalidateplus a purge-on-deploy step is the safe pairing; the SWR window without the purge is not.
FAQ
Does edge caching HTML help if my pages are personalized?
Only the public, identical-for-everyone parts. Mark personalized responses private, no-store and cache the rest. A common pattern is an edge-cached shell with personalized fragments hydrated client-side, so the document TTFB still comes from the edge.
What is the difference between max-age and s-maxage for TTFB?
max-age governs the user’s own browser cache and only helps repeat views. s-maxage governs shared caches and the CDN, so it is the directive that lets the edge answer the first byte instead of your origin — which is what moves field TTFB.
Why strip tracking parameters from the cache key?
Per-click parameters like utm_source, fbclid, and gclid do not change the HTML, but if they are in the key each one becomes a separate cache entry. Stripping them collapses thousands of identical variants into one entry and pushes the hit rate — and your TTFB win — far higher.
Related
- FCP & TTFB Analysis — parent reference for measuring and interpreting server and render timing.
- TTFB vs FCP: What Really Matters for SEO — sibling guide for deciding whether TTFB or FCP is the bottleneck to fix first.
- RUM Ingestion Endpoint on Cloudflare Workers — cross-pillar build for collecting the beacons that verify this fix.