Reducing TTFB with Edge Caching

The single fastest way to move Time to First Byte from a slow-tail number into the Good band is to stop computing HTML on every request and serve it from a cache that lives a few milliseconds from the user. As established in the parent FCP & TTFB Analysis work, TTFB is the first link in the critical-path chain, and because no pixel can paint before the first byte arrives, every millisecond shaved off TTFB is a millisecond returned to Largest Contentful Paint. This guide walks through caching HTML at the edge with Cache-Control s-maxage, stale-while-revalidate, sensible cache keys that strip tracking parameters, edge SSR/ISR, a Cloudflare Worker built on the Cache API, purge-on-deploy, and how to prove the win by measuring TTFB p75 before and after with the Navigation Timing API and your self-hosted beacon collection.

A cache hit serves stored HTML from the nearest point of presence; a miss pays full origin compute. See FCP & TTFB Analysis for how this gates the rest of the chain.

Prerequisites

Before you cache HTML at the edge, confirm these are in place:

A CDN or edge platform that can cache HTML responses and run code on the request path — Cloudflare Workers, Fastly Compute, or a Vercel/Netlify edge runtime.
A field-data baseline for TTFB at p75 over a 28-day window, from either your own Real-User Monitoring beacon collection or the CrUX API, so the before/after is measured, not assumed.
A clear separation of public vs personalized routes. Only responses that are identical for every user in a given segment are safe to cache as shared HTML. Anything keyed to a logged-in session stays out of the shared cache.
A deterministic deploy hook you can call from CI to purge the cache when content or the build changes.
Server-Timing headers emitted at the origin and preserved through the edge, so a cached response still tells you cache;desc=HIT and the miss path still reports origin compute.

What caching headers actually do

The shared-cache directives are distinct from the browser-cache max-age and easy to confuse. The table is the contract between your origin and every cache on the path.

Directive	Applies to	Effect on TTFB	When to use
`max-age=N`	Browser (private)	Repeat views only	Static assets, never shared HTML
`s-maxage=N`	Shared caches / CDN	First byte from edge for N seconds	Public HTML that changes on a schedule
`stale-while-revalidate=N`	Shared caches	Serves stale instantly, refreshes in background	Smoothing the moment freshness expires
`stale-if-error=N`	Shared caches	Serves stale on origin 5xx	Resilience when the origin is down
`private`	Browser only	No edge caching	Personalized or authenticated HTML
`no-store`	Nothing caches	Always full origin TTFB	Truly per-request responses

The combination that wins for public HTML is s-maxage for the fresh window plus stale-while-revalidate for the grace window. With s-maxage=60, stale-while-revalidate=600, the edge serves a sub-50 ms cached response for 60 seconds, then for the next 600 seconds keeps serving the stale copy at edge speed while it revalidates against the origin in the background — so a real user almost never waits on origin compute.

How to reduce TTFB with edge caching

Step 1 — Send shared-cache headers for public HTML

Set Cache-Control so shared caches store the document while the browser does not. Keep max-age low (or zero) for HTML so a user’s own back-navigation revalidates, but give shared caches a real s-maxage.

// Origin handler (Express-style). Only public, non-personalized routes.
function setHtmlCacheHeaders(req, res) {
  const isPersonalized = Boolean(req.cookies?.session_id);
  if (isPersonalized) {
    res.setHeader('Cache-Control', 'private, no-store');
    return;
  }
  res.setHeader(
    'Cache-Control',
    'public, max-age=0, s-maxage=60, stale-while-revalidate=600, stale-if-error=86400'
  );
  // Let caches vary correctly without fragmenting on noise.
  res.setHeader('Vary', 'Accept-Encoding');
}

Why: max-age=0 forces the browser to revalidate, so a user never sees a stale page after a deploy, while s-maxage=60 lets the CDN absorb the traffic. stale-while-revalidate removes the latency cliff at expiry, and stale-if-error keeps the site up at edge speed during an origin outage. A narrow Vary avoids splitting the cache into many barely-used variants.

Step 2 — Normalize the cache key and strip tracking params

A cache key that includes utm_*, fbclid, or gclid fragments your cache into thousands of identical-content variants, collapsing the hit rate and your TTFB win. Normalize the URL before it becomes the key.

const STRIP_PREFIXES = ['utm_', 'mc_'];
const STRIP_EXACT = new Set([
  'fbclid', 'gclid', 'gbraid', 'wbraid', 'msclkid', 'igshid', 'ref', 'mkt_tok',
]);

function cacheKeyUrl(rawUrl) {
  const url = new URL(rawUrl);
  for (const key of [...url.searchParams.keys()]) {
    const lower = key.toLowerCase();
    if (STRIP_EXACT.has(lower) || STRIP_PREFIXES.some((p) => lower.startsWith(p))) {
      url.searchParams.delete(key);
    }
  }
  url.searchParams.sort();           // order-independent key
  url.hash = '';                     // fragments never reach the server anyway
  return url.toString();
}

Why: Marketing links append per-click parameters that do not change the HTML. Stripping them and sorting the remainder means ?a=1&b=2 and ?b=2&a=1&utm_source=x resolve to one cache entry. Hit rate is the lever that controls how often a real user pays origin TTFB versus edge TTFB.

Step 3 — Cache HTML in a Cloudflare Worker with the Cache API

The Worker sits in front of the origin, builds the normalized key, and reads or writes the edge cache directly. This gives you full control over the key and the hit/miss semantics.

export default {
  async fetch(request, env, ctx) {
    if (request.method !== 'GET') return fetch(request);

    const cache = caches.default;
    const keyUrl = cacheKeyUrl(request.url);
    const cacheKey = new Request(keyUrl, { method: 'GET' });

    let response = await cache.match(cacheKey);
    if (response) {
      response = new Response(response.body, response);
      response.headers.set('X-Edge-Cache', 'HIT');
      return response;
    }

    // Miss: fetch origin with the ORIGINAL url (origin still sees real params).
    const originResponse = await fetch(request);
    response = new Response(originResponse.body, originResponse);
    response.headers.set('X-Edge-Cache', 'MISS');

    const cc = response.headers.get('Cache-Control') || '';
    const cacheable =
      response.status === 200 &&
      /s-maxage=\d/.test(cc) &&
      !/private|no-store/.test(cc);

    if (cacheable) {
      // Store a clone keyed by the normalized url; do not block the response.
      ctx.waitUntil(cache.put(cacheKey, response.clone()));
    }
    return response;
  },
};

Why: caches.default is the colocated edge cache, so a hit returns without a network hop to the origin — that is the TTFB collapse you are after. Storing under the normalized cacheKey while fetching the origin with the original request preserves any analytics the origin needs, and ctx.waitUntil writes the cache after the response has already been sent so the write never adds latency.

Step 4 — Move rendering to the edge with ISR

Caching only helps once the first request has paid for the render. With edge SSR plus Incremental Static Regeneration (ISR), the origin renders once, the edge holds the result, and subsequent requests are pure edge reads. In Next.js this is a per-route revalidation window that maps onto the same stale-while-revalidate behavior.

// app/blog/[slug]/page.js — Next.js App Router
export const revalidate = 60;              // regenerate at most once per 60s
export const dynamicParams = true;

export default async function Page({ params }) {
  const post = await fetch(`https://cms.example.com/posts/${params.slug}`, {
    next: { revalidate: 60, tags: [`post:${params.slug}`] },
  }).then((r) => r.json());
  return <article dangerouslySetInnerHTML={{ __html: post.html }} />;
}

Why: revalidate = 60 means at most one user per minute triggers a background regeneration; everyone else gets the cached render at edge TTFB. The tags let you purge exactly the affected route on a CMS change instead of flushing everything, which keeps the hit rate high right after an edit.

Step 5 — Purge the cache on deploy

A long s-maxage is only safe if you can invalidate instantly when a deploy ships new HTML. Wire a purge into the deploy pipeline so users never see stale content past a release.

#!/usr/bin/env bash
# purge-on-deploy.sh — run as the last CI step after a successful deploy
set -euo pipefail

: "${CF_ZONE_ID:?missing}"
: "${CF_API_TOKEN:?missing}"

curl -fsS -X POST \
  "https://api.cloudflare.com/client/v4/zones/${CF_ZONE_ID}/purge_cache" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -H "Content-Type: application/json" \
  --data '{"purge_everything":true}' \
  | grep -q '"success":true' && echo "edge cache purged"

Why: Purging at the end of the deploy lets you run aggressive s-maxage without the usual fear of serving an old build. For surgical invalidation, swap purge_everything for {"tags":["post:my-slug"]} (or "files":[...]) so only the changed routes drop out of cache and the rest keep serving from the edge.

Verifying it works

DevTools: Open the Network panel, select the document request, and read the Timing tab — “Waiting for server response” is TTFB. On a warm edge it should drop into the tens of milliseconds. Confirm the X-Edge-Cache: HIT and cf-cache-status: HIT response headers.
Shell: Probe the same URL twice and compare. A cold request shows a MISS and higher time_starttransfer; the second shows a HIT and a fraction of the time.

for i in 1 2; do
  curl -s -o /dev/null \
    -w "try ${i}: ttfb=%{time_starttransfer}s status=%{http_code} cache=%header{x-edge-cache}\n" \
    "https://www.example.com/blog/edge-caching"
done

Navigation Timing in the field: Re-aggregate TTFB at p75 from your RUM beacons, segmented by X-Edge-Cache, so a hit population and a miss population are never averaged together.

const [nav] = performance.getEntriesByType('navigation');
const ttfb = Math.round(nav.responseStart - nav.requestStart);
const st = nav.serverTiming.reduce((a, e) => (a[e.name] = e.desc || e.duration, a), {});
navigator.sendBeacon('/rum-collector', JSON.stringify({
  name: 'TTFB', value: ttfb, cache: st.cache || 'unknown', path: location.pathname,
}));

RUM dashboard: A real win is a sustained downward step in p75 TTFB over the 28-day window, plus a rising cache-hit ratio. If you wired measurement through the web-vitals API implementation, confirm p75 LCP moves down in step — that is the metric that actually changes the page-experience signal.

Edge cases & gotchas

Caching personalized HTML leaks data. Any response that varies by login state must be private, no-store. A single mis-cached authenticated page can serve one user’s content to thousands. Gate on the session cookie before you ever write to the shared cache.
Set-Cookie on a cacheable response is a trap. Most CDNs refuse to cache a response carrying Set-Cookie; if yours does cache it, you serve the same cookie to everyone. Strip Set-Cookie from public HTML at the origin or in the Worker before storing.
Vary: Cookie or Vary: User-Agent destroys the hit rate by fragmenting the cache per unique value. Keep Vary to Accept-Encoding (and Accept-Language only if you truly serve localized HTML).
Bot and crawler traffic can warm — or pollute — the cache. A crawler hitting every utm_ variant before you strip params multiplies your entries; Step 2’s normalization is what keeps that bounded.
Cache TTFB looks bimodal in the field. A 5 ms hit population and a 600 ms miss population blur into a meaningless p75 if combined. Always segment by cache status when reporting, exactly as the diagnosis in TTFB vs FCP: What Really Matters for SEO recommends.
Background revalidation can serve genuinely stale content past a deploy if you forget the purge. stale-while-revalidate plus a purge-on-deploy step is the safe pairing; the SWR window without the purge is not.

FAQ

Does edge caching HTML help if my pages are personalized?

Only the public, identical-for-everyone parts. Mark personalized responses private, no-store and cache the rest. A common pattern is an edge-cached shell with personalized fragments hydrated client-side, so the document TTFB still comes from the edge.

What is the difference between `max-age` and `s-maxage` for TTFB?

max-age governs the user’s own browser cache and only helps repeat views. s-maxage governs shared caches and the CDN, so it is the directive that lets the edge answer the first byte instead of your origin — which is what moves field TTFB.

Why strip tracking parameters from the cache key?

Per-click parameters like utm_source, fbclid, and gclid do not change the HTML, but if they are in the key each one becomes a separate cache entry. Stripping them collapses thousands of identical variants into one entry and pushes the hit rate — and your TTFB win — far higher.

FCP & TTFB Analysis — parent reference for measuring and interpreting server and render timing.
TTFB vs FCP: What Really Matters for SEO — sibling guide for deciding whether TTFB or FCP is the bottleneck to fix first.
RUM Ingestion Endpoint on Cloudflare Workers — cross-pillar build for collecting the beacons that verify this fix.