Head-Based vs Tail-Based Sampling for RUM
You cannot afford to store every beacon from a high-traffic site, but the naive fix — drop a fixed fraction of sessions at capture time — quietly discards the exact sessions you most need. This page contrasts the two sampling architectures that decide what to keep in a Real-User Monitoring pipeline: head-based sampling, where the browser flips a coin before it even knows whether the session was slow or errored, and tail-based sampling, where you buffer the full session and decide at the edge once the outcome is known. It builds on the broader RUM data sampling strategies this section documents, and shows the concrete JS for each approach plus how to reweight the survivors so your headline p75 aggregation stays unbiased.
The trade-off is fundamentally about when the keep/drop decision happens relative to when the outcome is known. Head-based sampling decides at capture time, in the browser, before any metric has finalized — it is cheap (you never pay to transmit a dropped beacon) but blind. Tail-based sampling defers the decision to the self-hosted beacon collection endpoint or edge worker, after the session’s INP, LCP, and CLS values are settled — so you can guarantee that every slow or errored session survives.
Prerequisites
Before implementing either strategy, confirm these are in place:
- A stable per-session identifier (a UUID generated once per page session, persisted in
sessionStorage), so a head sampler is deterministic — the same session always lands on the same keep/drop side rather than flipping per beacon. - Finalized vitals on the beacon. Whether you collect them with the web-vitals library and PerformanceObserver, each metric must be reported once at its final value (on
visibilitychange/pagehide), because tail-based decisions need the real INP and CLS, not interim values. - An edge or collector you control — a Cloudflare Worker, an Nginx/OpenResty tier, or your ingestion service — where the tail decision runs. You cannot do tail-based sampling in a third-party SaaS you do not control.
- A
weightcolumn in your event store (a BigQuery/ClickHouse INTEGER or FLOAT) so reweighting survives into aggregation. See designing a BigQuery schema for RUM events for where this column lives.
Head-based vs tail-based at a glance
| Property | Head-based | Tail-based |
|---|---|---|
| Decision point | Browser, at capture | Edge/collector, after outcome known |
| Bandwidth cost | Low — dropped beacons never sent | High — every session transmitted |
| Outcome awareness | None — blind to slow/errored sessions | Full — INP/CLS/errors are known |
| Poor-CWV retention | Same rate as Good (some lost) | 100% (all kept) |
| Error-session retention | Random | 100% (all kept) |
| Reweighting needed | Uniform 1/rate |
Per-bucket 1/rate |
| Complexity | Trivial | Moderate (stateful decision) |
The decisive row is Poor-CWV retention. At a 10% head rate, you keep roughly 10% of your Poor-INP sessions — exactly the tail you are paid to investigate. Tail-based keeps all of them.
How to implement both strategies
Step 1 — Build a deterministic head sampler
Hash the session id to a stable number in [0, 1) and compare against the keep rate. Using a hash rather than Math.random() makes the decision per-session and stable across reloads and across every beacon the session emits, so you never keep an LCP beacon while dropping that same session’s INP beacon.
// Deterministic 32-bit FNV-1a hash -> [0, 1). Same input always maps to the same value.
function hashUnitInterval(str) {
let h = 0x811c9dc5;
for (let i = 0; i < str.length; i++) {
h ^= str.charCodeAt(i);
h = Math.imul(h, 0x01000193);
}
// >>> 0 forces unsigned; divide by 2^32 to land in [0, 1).
return (h >>> 0) / 0x100000000;
}
function headKeep(sessionId, rate) {
return hashUnitInterval(sessionId) < rate;
}
// One session id per page session, persisted so reloads stay on the same side.
function getSessionId() {
let id = sessionStorage.getItem('rum_sid');
if (!id) {
id = crypto.randomUUID();
sessionStorage.setItem('rum_sid', id);
}
return id;
}
const HEAD_RATE = 0.10;
const sessionId = getSessionId();
const sampled = headKeep(sessionId, HEAD_RATE);
Why: determinism is the whole point. If you sampled per beacon with Math.random(), a single session could contribute its LCP but not its CLS, fragmenting the row you later try to join. A hash on the session id keeps a session whole — entirely in or entirely out — and the result is reproducible for debugging.
Step 2 — Send (or skip) the head-sampled beacon
Only transmit when the session is in the kept fraction, and stamp the beacon with the sample rate so the collector can reweight later.
function sendHeadSampled(metric) {
if (!sampled) return; // dropped before the bytes ever leave the device
const body = JSON.stringify({
sid: sessionId,
name: metric.name, // 'LCP' | 'INP' | 'CLS' | ...
value: metric.value,
rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'
sample_rate: HEAD_RATE, // collector turns this into weight = 1 / sample_rate
});
navigator.sendBeacon('/rum', body);
}
Why: carrying sample_rate on the wire means the keep probability is recorded next to the survivor. The reweight step (Step 5) multiplies each row by 1 / sample_rate, so 1 surviving session stands in for 10 at a 0.10 rate. Without this field you cannot reconstruct true volumes.
Step 3 — Send everything for the tail path
For tail-based sampling the browser does no dropping. It sends every finalized session to the collection endpoint. The only browser-side concern is delivering reliably on unload via the beacon collection path.
function sendTail(metrics) {
// metrics is the full finalized set for this session, batched into one beacon.
const body = JSON.stringify({ sid: sessionId, metrics });
// sendBeacon survives pagehide; the edge — not the browser — decides keep/drop.
navigator.sendBeacon('/rum', body);
}
Why: the browser cannot know at capture time whether this session will be the p75-defining slow one. Shipping everything moves the decision to a place that does know. You pay uplink bytes for sessions you will later drop, which is the price of never losing a Poor or errored session.
Step 4 — Make the keep/drop decision at the edge
Run the tail decision in an edge worker. The rule: keep unconditionally if the session is interesting (slow INP or any error), otherwise downsample the boring Good sessions to a small fraction. Stamp each kept row with the rate that admitted it.
// Cloudflare Worker style handler. Decides AFTER the outcome is known.
const GOOD_TAIL_RATE = 0.10; // keep 10% of healthy sessions
const INP_POOR_MS = 500; // INP > 500 ms is Poor per the CWV spec
export default {
async fetch(request, env) {
const event = await request.json();
const inp = event.metrics?.find((m) => m.name === 'INP')?.value ?? 0;
const hadError = Boolean(event.had_error);
const interesting = inp > INP_POOR_MS || hadError;
let keep, sampleRate;
if (interesting) {
keep = true;
sampleRate = 1; // kept with certainty -> weight 1
} else {
// Deterministic downsample of Good sessions, reusing the same hash.
keep = hashUnitInterval(event.sid) < GOOD_TAIL_RATE;
sampleRate = GOOD_TAIL_RATE; // weight = 1 / 0.10 = 10
}
if (keep) {
event.weight = 1 / sampleRate;
await env.RUM_QUEUE.send(event); // forward to ClickHouse/BigQuery writer
}
return new Response(null, { status: 204 });
},
};
// hashUnitInterval is the same FNV-1a function from Step 1, shared edge-side.
function hashUnitInterval(str) {
let h = 0x811c9dc5;
for (let i = 0; i < str.length; i++) {
h ^= str.charCodeAt(i);
h = Math.imul(h, 0x01000193);
}
return (h >>> 0) / 0x100000000;
}
Why: the condition inp > 500 || hadError guarantees the slow tail is retained at full fidelity, which is the only way a p75 or p99 over INP, or a CLS breakdown, stays trustworthy. Good sessions are abundant and cheap to estimate from a 10% sample, so downsampling them shrinks storage without touching the part of the distribution that matters.
Step 5 — Reweight to recover an unbiased p75
Both strategies bias raw counts: survivors over-represent whatever bucket they came from. Restore the true distribution by giving each row a weight of 1 / sample_rate, then compute a weighted p75. A weighted percentile expands each row into weight virtual copies and reads off the 75th-percentile boundary.
-- Weighted p75 of INP. Each row counts `weight` times (10 for a 10%-sampled Good
-- session, 1 for a fully-kept Poor/errored one), reconstructing the true distribution.
SELECT
quantileTDigestWeighted(0.75)(value, toUInt64(weight)) AS inp_p75
FROM rum_events
WHERE name = 'INP'
AND event_date = today();
Why: if you skipped the weight, a tail-based store would look catastrophic — Poor sessions are kept at 100% while Good ones are thinned to 10%, so unweighted the Poor sessions are over-represented 10×, dragging the apparent p75 far above reality. Multiplying Good sessions back up by 10 (and Poor by 1) cancels the sampling exactly, so the weighted p75 matches what you would have measured with zero sampling. The same 1 / sample_rate rule fixes the uniform bias of a head-based store.
Verifying it works
Confirm each layer before trusting the dashboard:
- Determinism of the head sampler. In DevTools console, call
headKeep('fixed-uuid', 0.1)repeatedly — it must return the same boolean every time, and roughly 10% of distinct UUIDs should returntrue. Generate 100k UUIDs in a loop and assert the keep fraction is within a couple of points of0.10. - Tail keeps every Poor session. Replay a synthetic session with
INP = 800andhad_error = falsethrough the worker; it must forward withweight === 1. Replay one withINP = 120; it should forward only ~10% of the time, always withweight === 10. - Reweighting closes the gap. Run the weighted query alongside a brief unsampled control window. The weighted p75 from the sampled store should sit within ~2–3% of the unsampled p75. A large divergence means a
sample_rate/weightmismatch. - No fragmented sessions. Query for session ids that appear with some metrics but not others under head sampling — there should be none, because the per-session hash keeps each session whole.
Edge cases & gotchas
Math.random()head sampling fragments sessions. Sampling per beacon instead of per session means a session can land in the kept set for LCP and the dropped set for INP. Always hash a stable session id; never re-roll per metric.- Errors that prevent the beacon. Tail-based retention of errored sessions only works if the beacon still fires after the error. Capture errors via
window.onerror/unhandledrejection, sethad_error, and flush onpagehideso a crashing session still reports. - Weight skew destabilizes percentiles. With very aggressive Good downsampling (say 1%), each Good survivor carries weight 100, so a handful of them can jitter the weighted p75 between refreshes. Keep the Good rate no lower than ~5–10% unless your traffic is enormous.
- Counting distinct sessions after sampling. A
COUNT(*)over a sampled table undercounts; useSUM(weight)for session totals and reserve raw counts for storage accounting only. - Edge state for true multi-beacon tail decisions. The Step 4 worker assumes one batched beacon per session. If you emit metrics in several beacons, you need a short-lived per-session buffer (a Durable Object or KV keyed by
sid) so the edge sees the whole session before deciding — otherwise the first beacon is judged without INP. - Bot and prerender traffic. Headless/prerender hits inflate the Good bucket. Filter them before sampling, or their weights distort the reconstructed p75.
FAQ
When should I prefer tail-based over head-based sampling?
Choose tail-based when the questions you ask are about the slow end of the distribution — debugging Poor INP, auditing error sessions, or trusting p75/p99 at low traffic. Choose head-based when uplink bandwidth or device cost dominates and you only need coarse central tendencies. Many teams run both: a head pre-filter to cap volume, then a tail decision at the edge on what survives.
Does tail-based sampling bias my p75 if I keep all Poor sessions?
Only if you forget to reweight. Keeping Poor at 100% and Good at 10% over-represents Poor 10× in raw counts. Assigning weight = 1 / sample_rate and computing a weighted percentile cancels the sampling exactly, so the weighted p75 matches the unsampled value within noise.
Why hash the session id instead of using Math.random()?
A hash is deterministic: the same session id always maps to the same number, so the keep/drop decision is identical across reloads and across every beacon the session sends. Math.random() re-rolls each call, which can keep one metric and drop another from the same session, fragmenting the row and corrupting per-session joins.
Related
- RUM Data Sampling Strategies — the parent guide to sampling rates, p75 aggregation, and storage trade-offs.
- Self-Hosted Beacon Collection — the ingestion endpoint where the tail-based decision and reweighting run.
- Designing a BigQuery Schema for RUM Events — where the
weightandsample_ratecolumns live in the event table.