Head-Based vs Tail-Based Sampling for RUM

You cannot afford to store every beacon from a high-traffic site, but the naive fix — drop a fixed fraction of sessions at capture time — quietly discards the exact sessions you most need. This page contrasts the two sampling architectures that decide what to keep in a Real-User Monitoring pipeline: head-based sampling, where the browser flips a coin before it even knows whether the session was slow or errored, and tail-based sampling, where you buffer the full session and decide at the edge once the outcome is known. It builds on the broader RUM data sampling strategies this section documents, and shows the concrete JS for each approach plus how to reweight the survivors so your headline p75 aggregation stays unbiased.

Head-based versus tail-based sampling decision points Head-based sampling drops sessions in the browser at capture time before the outcome is known, losing some Poor sessions. Tail-based sampling sends every session to the edge, which keeps all Poor and errored sessions and downsamples the Good ones. Head-based (decide in browser) Browser hash(session) Keep 10%? outcome unknown Poor may be cut Store 90% dropped pre-send Tail-based (decide at edge) Browser send all Edge decides outcome known keep all Poor Store Poor 100%, Good 10% + weight column Reweight w = 1/rate unbiased p75
Head-based saves bandwidth but is blind to outcome; tail-based keeps every Poor session at the cost of uplink, then reweights to recover an unbiased p75.

The trade-off is fundamentally about when the keep/drop decision happens relative to when the outcome is known. Head-based sampling decides at capture time, in the browser, before any metric has finalized — it is cheap (you never pay to transmit a dropped beacon) but blind. Tail-based sampling defers the decision to the self-hosted beacon collection endpoint or edge worker, after the session’s INP, LCP, and CLS values are settled — so you can guarantee that every slow or errored session survives.

Prerequisites

Before implementing either strategy, confirm these are in place:

  • A stable per-session identifier (a UUID generated once per page session, persisted in sessionStorage), so a head sampler is deterministic — the same session always lands on the same keep/drop side rather than flipping per beacon.
  • Finalized vitals on the beacon. Whether you collect them with the web-vitals library and PerformanceObserver, each metric must be reported once at its final value (on visibilitychange/pagehide), because tail-based decisions need the real INP and CLS, not interim values.
  • An edge or collector you control — a Cloudflare Worker, an Nginx/OpenResty tier, or your ingestion service — where the tail decision runs. You cannot do tail-based sampling in a third-party SaaS you do not control.
  • A weight column in your event store (a BigQuery/ClickHouse INTEGER or FLOAT) so reweighting survives into aggregation. See designing a BigQuery schema for RUM events for where this column lives.

Head-based vs tail-based at a glance

Property Head-based Tail-based
Decision point Browser, at capture Edge/collector, after outcome known
Bandwidth cost Low — dropped beacons never sent High — every session transmitted
Outcome awareness None — blind to slow/errored sessions Full — INP/CLS/errors are known
Poor-CWV retention Same rate as Good (some lost) 100% (all kept)
Error-session retention Random 100% (all kept)
Reweighting needed Uniform 1/rate Per-bucket 1/rate
Complexity Trivial Moderate (stateful decision)

The decisive row is Poor-CWV retention. At a 10% head rate, you keep roughly 10% of your Poor-INP sessions — exactly the tail you are paid to investigate. Tail-based keeps all of them.

How to implement both strategies

Step 1 — Build a deterministic head sampler

Hash the session id to a stable number in [0, 1) and compare against the keep rate. Using a hash rather than Math.random() makes the decision per-session and stable across reloads and across every beacon the session emits, so you never keep an LCP beacon while dropping that same session’s INP beacon.

// Deterministic 32-bit FNV-1a hash -> [0, 1). Same input always maps to the same value.
function hashUnitInterval(str) {
  let h = 0x811c9dc5;
  for (let i = 0; i < str.length; i++) {
    h ^= str.charCodeAt(i);
    h = Math.imul(h, 0x01000193);
  }
  // >>> 0 forces unsigned; divide by 2^32 to land in [0, 1).
  return (h >>> 0) / 0x100000000;
}

function headKeep(sessionId, rate) {
  return hashUnitInterval(sessionId) < rate;
}

// One session id per page session, persisted so reloads stay on the same side.
function getSessionId() {
  let id = sessionStorage.getItem('rum_sid');
  if (!id) {
    id = crypto.randomUUID();
    sessionStorage.setItem('rum_sid', id);
  }
  return id;
}

const HEAD_RATE = 0.10;
const sessionId = getSessionId();
const sampled = headKeep(sessionId, HEAD_RATE);

Why: determinism is the whole point. If you sampled per beacon with Math.random(), a single session could contribute its LCP but not its CLS, fragmenting the row you later try to join. A hash on the session id keeps a session whole — entirely in or entirely out — and the result is reproducible for debugging.

Step 2 — Send (or skip) the head-sampled beacon

Only transmit when the session is in the kept fraction, and stamp the beacon with the sample rate so the collector can reweight later.

function sendHeadSampled(metric) {
  if (!sampled) return; // dropped before the bytes ever leave the device
  const body = JSON.stringify({
    sid: sessionId,
    name: metric.name,        // 'LCP' | 'INP' | 'CLS' | ...
    value: metric.value,
    rating: metric.rating,    // 'good' | 'needs-improvement' | 'poor'
    sample_rate: HEAD_RATE,   // collector turns this into weight = 1 / sample_rate
  });
  navigator.sendBeacon('/rum', body);
}

Why: carrying sample_rate on the wire means the keep probability is recorded next to the survivor. The reweight step (Step 5) multiplies each row by 1 / sample_rate, so 1 surviving session stands in for 10 at a 0.10 rate. Without this field you cannot reconstruct true volumes.

Step 3 — Send everything for the tail path

For tail-based sampling the browser does no dropping. It sends every finalized session to the collection endpoint. The only browser-side concern is delivering reliably on unload via the beacon collection path.

function sendTail(metrics) {
  // metrics is the full finalized set for this session, batched into one beacon.
  const body = JSON.stringify({ sid: sessionId, metrics });
  // sendBeacon survives pagehide; the edge — not the browser — decides keep/drop.
  navigator.sendBeacon('/rum', body);
}

Why: the browser cannot know at capture time whether this session will be the p75-defining slow one. Shipping everything moves the decision to a place that does know. You pay uplink bytes for sessions you will later drop, which is the price of never losing a Poor or errored session.

Step 4 — Make the keep/drop decision at the edge

Run the tail decision in an edge worker. The rule: keep unconditionally if the session is interesting (slow INP or any error), otherwise downsample the boring Good sessions to a small fraction. Stamp each kept row with the rate that admitted it.

// Cloudflare Worker style handler. Decides AFTER the outcome is known.
const GOOD_TAIL_RATE = 0.10;          // keep 10% of healthy sessions
const INP_POOR_MS = 500;              // INP > 500 ms is Poor per the CWV spec

export default {
  async fetch(request, env) {
    const event = await request.json();
    const inp = event.metrics?.find((m) => m.name === 'INP')?.value ?? 0;
    const hadError = Boolean(event.had_error);

    const interesting = inp > INP_POOR_MS || hadError;

    let keep, sampleRate;
    if (interesting) {
      keep = true;
      sampleRate = 1;                 // kept with certainty -> weight 1
    } else {
      // Deterministic downsample of Good sessions, reusing the same hash.
      keep = hashUnitInterval(event.sid) < GOOD_TAIL_RATE;
      sampleRate = GOOD_TAIL_RATE;    // weight = 1 / 0.10 = 10
    }

    if (keep) {
      event.weight = 1 / sampleRate;
      await env.RUM_QUEUE.send(event); // forward to ClickHouse/BigQuery writer
    }
    return new Response(null, { status: 204 });
  },
};

// hashUnitInterval is the same FNV-1a function from Step 1, shared edge-side.
function hashUnitInterval(str) {
  let h = 0x811c9dc5;
  for (let i = 0; i < str.length; i++) {
    h ^= str.charCodeAt(i);
    h = Math.imul(h, 0x01000193);
  }
  return (h >>> 0) / 0x100000000;
}

Why: the condition inp > 500 || hadError guarantees the slow tail is retained at full fidelity, which is the only way a p75 or p99 over INP, or a CLS breakdown, stays trustworthy. Good sessions are abundant and cheap to estimate from a 10% sample, so downsampling them shrinks storage without touching the part of the distribution that matters.

Step 5 — Reweight to recover an unbiased p75

Both strategies bias raw counts: survivors over-represent whatever bucket they came from. Restore the true distribution by giving each row a weight of 1 / sample_rate, then compute a weighted p75. A weighted percentile expands each row into weight virtual copies and reads off the 75th-percentile boundary.

-- Weighted p75 of INP. Each row counts `weight` times (10 for a 10%-sampled Good
-- session, 1 for a fully-kept Poor/errored one), reconstructing the true distribution.
SELECT
  quantileTDigestWeighted(0.75)(value, toUInt64(weight)) AS inp_p75
FROM rum_events
WHERE name = 'INP'
  AND event_date = today();

Why: if you skipped the weight, a tail-based store would look catastrophic — Poor sessions are kept at 100% while Good ones are thinned to 10%, so unweighted the Poor sessions are over-represented 10×, dragging the apparent p75 far above reality. Multiplying Good sessions back up by 10 (and Poor by 1) cancels the sampling exactly, so the weighted p75 matches what you would have measured with zero sampling. The same 1 / sample_rate rule fixes the uniform bias of a head-based store.

Verifying it works

Confirm each layer before trusting the dashboard:

  1. Determinism of the head sampler. In DevTools console, call headKeep('fixed-uuid', 0.1) repeatedly — it must return the same boolean every time, and roughly 10% of distinct UUIDs should return true. Generate 100k UUIDs in a loop and assert the keep fraction is within a couple of points of 0.10.
  2. Tail keeps every Poor session. Replay a synthetic session with INP = 800 and had_error = false through the worker; it must forward with weight === 1. Replay one with INP = 120; it should forward only ~10% of the time, always with weight === 10.
  3. Reweighting closes the gap. Run the weighted query alongside a brief unsampled control window. The weighted p75 from the sampled store should sit within ~2–3% of the unsampled p75. A large divergence means a sample_rate/weight mismatch.
  4. No fragmented sessions. Query for session ids that appear with some metrics but not others under head sampling — there should be none, because the per-session hash keeps each session whole.

Edge cases & gotchas

  • Math.random() head sampling fragments sessions. Sampling per beacon instead of per session means a session can land in the kept set for LCP and the dropped set for INP. Always hash a stable session id; never re-roll per metric.
  • Errors that prevent the beacon. Tail-based retention of errored sessions only works if the beacon still fires after the error. Capture errors via window.onerror/unhandledrejection, set had_error, and flush on pagehide so a crashing session still reports.
  • Weight skew destabilizes percentiles. With very aggressive Good downsampling (say 1%), each Good survivor carries weight 100, so a handful of them can jitter the weighted p75 between refreshes. Keep the Good rate no lower than ~5–10% unless your traffic is enormous.
  • Counting distinct sessions after sampling. A COUNT(*) over a sampled table undercounts; use SUM(weight) for session totals and reserve raw counts for storage accounting only.
  • Edge state for true multi-beacon tail decisions. The Step 4 worker assumes one batched beacon per session. If you emit metrics in several beacons, you need a short-lived per-session buffer (a Durable Object or KV keyed by sid) so the edge sees the whole session before deciding — otherwise the first beacon is judged without INP.
  • Bot and prerender traffic. Headless/prerender hits inflate the Good bucket. Filter them before sampling, or their weights distort the reconstructed p75.

FAQ

When should I prefer tail-based over head-based sampling?

Choose tail-based when the questions you ask are about the slow end of the distribution — debugging Poor INP, auditing error sessions, or trusting p75/p99 at low traffic. Choose head-based when uplink bandwidth or device cost dominates and you only need coarse central tendencies. Many teams run both: a head pre-filter to cap volume, then a tail decision at the edge on what survives.

Does tail-based sampling bias my p75 if I keep all Poor sessions?

Only if you forget to reweight. Keeping Poor at 100% and Good at 10% over-represents Poor 10× in raw counts. Assigning weight = 1 / sample_rate and computing a weighted percentile cancels the sampling exactly, so the weighted p75 matches the unsampled value within noise.

Why hash the session id instead of using Math.random()?

A hash is deterministic: the same session id always maps to the same number, so the keep/drop decision is identical across reloads and across every beacon the session sends. Math.random() re-rolls each call, which can keep one metric and drop another from the same session, fragmenting the row and corrupting per-session joins.