Self-Hosted Beacon Collection
A self-hosted beacon path is the part of a Real-User Monitoring stack that decides whether a measured metric ever becomes a stored row. Everything upstream — the observers that capture Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift — is wasted if the beacon is dropped on unload, rejected at the edge, or silently deduplicated away. As established in RUM Architecture, Tooling & Self-Hosting, owning this layer buys you schema control, data retention you set, and cost that scales with rows rather than seats; the price is that delivery reliability, validation, back-pressure, and idempotency are now your problem. This page covers the full round trip: how the browser batches and sends, what the ingestion endpoint must validate and answer, how events land in columnar storage, and how to debug the gaps.
Transport: sendBeacon vs fetch keepalive
There are exactly two transports that survive a page being torn down, and choosing between them is the first design decision. navigator.sendBeacon() queues a POST in the browser’s network stack and returns synchronously; the request completes even after the document is gone. It is the correct default because it is the one transport the browser actively keeps alive through unload without you holding the event loop. Its limits are real: it sends only POST, you cannot set custom request headers (the Content-Type is inferred from the Blob type), you cannot read the response, and the queued body is capped — Chromium enforces roughly a 64 KB ceiling and returns false if the buffer is full or the payload is too large.
fetch() with keepalive: true is the escape hatch when you need a custom header (an auth token, a traceparent) or want to attempt delivery from a non-unload context. It shares the same keepalive byte budget as sendBeacon — the spec caps total in-flight keepalive bodies per page, so a large payload can throw QuotaExceededError. The robust pattern is sendBeacon first, fall back to keepalive fetch only when sendBeacon returns false or you genuinely need headers. These transports are the delivery half of the web-vitals API implementation story, where PerformanceObserver and the web-vitals library produce the values you are shipping.
| Capability | navigator.sendBeacon() |
fetch(..., {keepalive:true}) |
|---|---|---|
Survives pagehide/unload |
Yes (primary use) | Yes, within keepalive budget |
| Custom request headers | No | Yes |
| Response readable | No (fire-and-forget) | Yes |
| Methods | POST only | Any |
| Size ceiling | ~64 KB body | Shared keepalive quota |
| Return on overflow | false synchronously |
rejects QuotaExceededError |
| Recommended role | Default transport | Header/auth fallback |
Payload schema and batching
Beacons are emitted at unpredictable moments — INP and CLS are only final at page lifecycle end, while LCP can settle earlier. Rather than firing one request per metric, accumulate metrics in an in-memory queue keyed by page view and flush once. A flat, short-keyed JSON envelope keeps you well under the 64 KB ceiling and maps cleanly onto a columnar row. Each beacon carries a client-generated id (a v4 UUID) so the server can deduplicate retries; the same page view, if it manages to flush twice, produces the same id and collapses to one row.
| Field | Type | Meaning |
|---|---|---|
id |
UUID v4 | Idempotency key for dedupe |
sid |
string | Session id (rotating, non-PII) |
vid |
string | Page-view id |
u |
string | URL path (no query/fragment) |
ct |
enum | effectiveType network class |
dt |
enum | Device tier bucket |
cc |
string | Country (resolved at edge) |
ts |
int | Client epoch ms at flush |
m |
object | Metric map: lcp, inp, cls |
Client batcher implementation
The batcher below subscribes to metric callbacks, queues them, and flushes exactly once on the first pagehide (or visibilitychange to hidden, which Safari fires more reliably). It guards against double-send with a sent flag while still keeping the id stable so a server sees an identical envelope if a retry slips through.
// rum-batcher.js — accumulate vitals, flush once on page lifecycle end
const ENDPOINT = 'https://rum.example.com/v1/beacon';
function makeBatcher() {
const payload = {
id: crypto.randomUUID(),
sid: getSessionId(), // rotating, hashed, no PII
vid: crypto.randomUUID(),
u: location.pathname,
ct: navigator.connection?.effectiveType ?? 'unknown',
dt: deviceTier(),
ts: 0,
m: {}
};
let sent = false;
function record(metric) {
// metric === { name: 'LCP'|'INP'|'CLS', value: number }
payload.m[metric.name.toLowerCase()] = Math.round(
metric.name === 'CLS' ? metric.value * 1000 : metric.value
);
}
function flush() {
if (sent) return;
if (Object.keys(payload.m).length === 0) return;
sent = true;
payload.ts = Date.now();
const body = JSON.stringify(payload);
const blob = new Blob([body], { type: 'application/json' });
const ok = navigator.sendBeacon && navigator.sendBeacon(ENDPOINT, blob);
if (!ok) {
// sendBeacon refused (too large / buffer full) — keepalive fallback
fetch(ENDPOINT, {
method: 'POST',
body,
keepalive: true,
headers: { 'Content-Type': 'application/json' }
}).catch(() => { /* lossy by design */ });
}
}
// pagehide is the reliable lifecycle hook; visibilitychange covers Safari
addEventListener('pagehide', flush, { capture: true });
addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') flush();
}, { capture: true });
return { record };
}
function deviceTier() {
const mem = navigator.deviceMemory ?? 4;
const cores = navigator.hardwareConcurrency ?? 4;
if (mem <= 2 || cores <= 2) return 'low';
if (mem <= 4 || cores <= 4) return 'mid';
return 'high';
}
// Wire it to the web-vitals library
import { onLCP, onINP, onCLS } from 'web-vitals';
const batcher = makeBatcher();
onLCP(batcher.record);
onINP(batcher.record);
onCLS(batcher.record);
Two details matter. CLS is multiplied by 1000 and rounded so it travels as a small integer rather than a float, which keeps the payload tight and the column an integer. The pagehide and visibilitychange listeners are both registered because no single event fires on every browser-and-platform combination — bfcache restores, tab discards, and iOS background transitions each miss one or the other.
Ingestion endpoint design
The endpoint has one hard rule: it answers fast and never lets storage latency leak back to the browser. It validates the envelope, authenticates and rate-limits the caller, applies CORS, enqueues the event for asynchronous writing, and returns 204 No Content with an empty body. 204 is correct precisely because sendBeacon discards the response — there is nothing to send back, and a 200 with a JSON body just wastes bytes. The handler below is a self-contained Cloudflare Workers fetch handler; the same logic ports to any runtime. For the production hardening of this endpoint — KV-backed rate limits, secret rotation, and Durable Object dedupe — see building a RUM ingestion endpoint on Cloudflare Workers.
// worker.js — minimal validated beacon ingestion endpoint
const ALLOWED_ORIGINS = new Set(['https://www.example.com', 'https://app.example.com']);
const MAX_BODY = 64 * 1024; // match sendBeacon ceiling
function corsHeaders(origin) {
const allow = ALLOWED_ORIGINS.has(origin) ? origin : 'null';
return {
'Access-Control-Allow-Origin': allow,
'Access-Control-Allow-Methods': 'POST, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Max-Age': '86400'
};
}
function validate(p) {
if (typeof p !== 'object' || p === null) return 'not an object';
if (typeof p.id !== 'string' || p.id.length !== 36) return 'bad id';
if (typeof p.u !== 'string' || p.u.length > 512) return 'bad url';
if (typeof p.m !== 'object' || p.m === null) return 'bad metrics';
for (const [k, v] of Object.entries(p.m)) {
if (!['lcp', 'inp', 'cls'].includes(k)) return `unknown metric ${k}`;
if (!Number.isFinite(v) || v < 0 || v > 600000) return `bad ${k}`;
}
return null;
}
export default {
async fetch(request, env, ctx) {
const origin = request.headers.get('Origin') ?? '';
const cors = corsHeaders(origin);
if (request.method === 'OPTIONS') {
return new Response(null, { status: 204, headers: cors }); // CORS preflight
}
if (request.method !== 'POST') {
return new Response(null, { status: 405, headers: cors });
}
// Rate-limit per client IP via KV counter (best-effort, fail-open)
const ip = request.headers.get('CF-Connecting-IP') ?? '0.0.0.0';
const rlKey = `rl:${ip}:${Math.floor(Date.now() / 1000 / 10)}`; // 10s bucket
const count = parseInt((await env.RL.get(rlKey)) ?? '0', 10);
if (count > 100) return new Response(null, { status: 429, headers: cors });
ctx.waitUntil(env.RL.put(rlKey, String(count + 1), { expirationTtl: 30 }));
const len = parseInt(request.headers.get('Content-Length') ?? '0', 10);
if (len > MAX_BODY) return new Response(null, { status: 413, headers: cors });
let payload;
try { payload = await request.json(); }
catch { return new Response(null, { status: 400, headers: cors }); }
const err = validate(payload);
if (err) return new Response(null, { status: 422, headers: cors });
// Dedupe by id, then enqueue async write — never block the response
payload.cc = request.cf?.country ?? 'XX';
ctx.waitUntil((async () => {
const seen = await env.DEDUPE.get(payload.id);
if (seen) return; // idempotent drop
await env.DEDUPE.put(payload.id, '1', { expirationTtl: 3600 });
await env.QUEUE.send(payload); // back-pressure buffer
})());
return new Response(null, { status: 204, headers: cors }); // empty, fast
}
};
The response returns before the queue write completes — ctx.waitUntil() keeps the async work alive without holding the 204. That ordering is the back-pressure boundary: a slow or saturated downstream queue can never stall the browser, and if the queue itself sheds load, the only casualty is a dropped beacon, which the design already tolerates.
Writing to columnar storage
The queue consumer flushes batches into a columnar store where high-cardinality dimensions (URL, country, connection type) and percentile aggregation are cheap. ClickHouse is the common self-hosted choice; BigQuery is the managed equivalent. The schema below keeps id as the dedupe key and stores CLS as the integer it traveled as. A correct p75 needs an aggregating engine — a plain SummingMergeTree cannot sum quantiles — so percentile state is computed with quantileState and merged at read time. The full pipeline build is covered in setting up a self-hosted RUM pipeline with ClickHouse, and the managed-warehouse layout in designing a BigQuery schema for RUM events.
-- ClickHouse raw beacon table; id carries through for downstream dedupe checks
CREATE TABLE rum_events_raw
(
`id` UUID,
`ts` DateTime64(3),
`session_id` String,
`view_id` String,
`url_path` String,
`country_code` LowCardinality(String),
`connection_type` LowCardinality(String),
`device_tier` LowCardinality(String),
`lcp_ms` UInt32 DEFAULT 0,
`inp_ms` UInt32 DEFAULT 0,
`cls_x1000` UInt32 DEFAULT 0
)
ENGINE = MergeTree()
ORDER BY (ts, url_path, country_code)
TTL toDateTime(ts) + INTERVAL 90 DAY;
-- AggregatingMergeTree MV: quantileState is summable on merge
CREATE MATERIALIZED VIEW rum_cwv_p75_mv
ENGINE = AggregatingMergeTree()
ORDER BY (date, url_path, connection_type)
AS SELECT
toDate(ts) AS date,
url_path,
connection_type,
quantileState(0.75)(lcp_ms) AS p75_lcp_state,
quantileState(0.75)(inp_ms) AS p75_inp_state,
quantileState(0.75)(cls_x1000) AS p75_cls_state,
count() AS event_count
FROM rum_events_raw
GROUP BY date, url_path, connection_type;
-- Read p75 (CLS divided back to its 0–1 scale)
SELECT
date,
url_path,
quantileMerge(0.75)(p75_lcp_state) AS p75_lcp_ms,
quantileMerge(0.75)(p75_inp_state) AS p75_inp_ms,
quantileMerge(0.75)(p75_cls_state) / 1000 AS p75_cls,
sum(event_count) AS samples
FROM rum_cwv_p75_mv
WHERE date >= today() - 7
GROUP BY date, url_path
ORDER BY date DESC;
Threshold configuration
The endpoint and the dashboard built on top of it both reference the same Good / Needs Improvement / Poor bands. Bake these thresholds into the consumer so a regression query and an alert agree on what “Poor” means. CLS is shown on its native 0–1 scale; remember the column stores it multiplied by 1000.
| Metric | Good | Needs Improvement | Poor | Engineering action at p75 |
|---|---|---|---|---|
| LCP | ≤ 2.5 s | ≤ 4.0 s | > 4.0 s | Alert if p75 crosses 2.5 s for a route |
| INP | ≤ 200 ms | ≤ 500 ms | > 500 ms | Capture interaction attribution on Poor |
| CLS | ≤ 0.1 | ≤ 0.25 | > 0.25 | Segment by viewport before paging |
| FCP | ≤ 1.8 s | ≤ 3.0 s | > 3.0 s | Diagnostic for LCP regressions |
| TTFB | ≤ 800 ms | ≤ 1.8 s | > 1.8 s | Split server vs network at the edge |
Debugging workflow
When metrics go missing or look wrong, the failure is almost always in transport or validation, not in measurement. Work the path in order:
- Identify the gap. Compare beacon volume against page-view volume from an independent counter (server logs, CDN analytics). A delivery hole shows up as a ratio below ~0.95 that correlates with a browser, route, or region.
- Trace the request. Open DevTools Network, filter to the beacon endpoint, and confirm the request appears under the “Other”/beacon initiator on
pagehide. No request means the lifecycle listener never fired or sendBeacon returnedfalse. - Correlate the response. Check the status: a
422is a schema mismatch,413is an oversized payload,429is rate-limit shedding, and a CORS error in the console with no status is a failed preflight. - Validate in lab. Reproduce with a scripted unload (
page.close()in Playwright) and assert the collector received a row with the expectedidand metric fields. - Deploy the fix. Ship the transport or validation change behind the same flag as the batcher so client and server move together.
- Monitor the delta. Watch the beacon-to-pageview ratio and the per-status histogram for 24–48 hours; a fix should lift the ratio without inflating
422/429.
Field-data segmentation
Aggregate delivery rates and metric percentiles are deceptive because beacon loss is not uniform — it concentrates exactly where measurement matters most. Segment every health and metric query along the dimensions the schema already carries:
- Device tier. Low-tier devices both perform worse and drop more beacons (background suspension, memory pressure killing the page before flush). A
dt = 'low'slice with a low delivery ratio is hiding your worst INP. - Network class.
ctofslow-2g/2gcorrelates with timed-out keepalive fetches; loss here biases your distribution optimistic. - Geography. A regional delivery drop usually means a CORS or edge-routing problem at one PoP, not a real performance change — verify the ratio before believing the metric.
- Browser/platform. A Safari-only gap points at a
pagehideversusvisibilitychangeissue; an Android-WebView gap often points at sendBeacon being unavailable in an embedded context.
The right sampling rate to read these segments confidently — and how to keep p75 stable as you down-sample — is the subject of RUM data sampling strategies.
Failure modes and gotchas
- Beacon dropped on unload. The classic loss: a metric finalized late never ships because the page is already gone. Mitigation is flushing on
pagehide/visibilitychange(both, as shown) rather than the deprecatedunload, which blocks bfcache and fires unreliably on mobile. - CORS preflight failures. Sending
application/jsonwith sendBeacon keeps the request “simple” and preflight-free; but a keepalive fetch that adds a custom header (traceparent, auth) triggers anOPTIONSpreflight. If the endpoint does not answerOPTIONSwith the matchingAccess-Control-Allow-Headers, the beacon silently fails. The handler above returns 204 to preflight explicitly. - sendBeacon size limit. Chromium caps the queued body around 64 KB and returns
falseon overflow. Keep envelopes flat and short-keyed; when payloads still grow (extra attribution data), apply the techniques in reducing beacon payload size for mobile networks before they hit the ceiling. - Safari quirks. Safari historically under-fires
pagehideand has had intermittentPerformanceObservergaps for some entry types; thevisibilitychange → hiddenlistener is the reliable flush trigger there. Test bfcache restore explicitly, because a restored page must arm a fresh batcher. - Background-tab suspension. A discarded or frozen tab may never run your flush. Treat lifecycle
freezeas a flush trigger too if you instrument the Page Lifecycle API, and accept that some loss is irreducible. - Double counting on retry. A keepalive fetch that the browser retries, or a user who navigates back into bfcache and re-flushes, can deliver the same envelope twice. The stable client
idplus server-side dedupe (theDEDUPElookup above) makes the write idempotent.
CI/CD gating
Beacon collection regresses silently, so gate it in the pipeline rather than discovering loss in production dashboards. Add a Playwright job that loads a test page, drives an interaction, closes the page, and asserts the collector received exactly one row with a valid id and the expected metric keys — this catches a broken transport, a schema-breaking field rename, or a CORS misconfiguration before merge. Run a contract test that POSTs known-bad payloads (oversized, missing id, out-of-range metric) and asserts the endpoint answers 413/422 rather than 204, so validation cannot silently weaken. Finally, fail the build if the synthetic beacon-to-pageview ratio in a smoke environment drops below threshold.
# ci-beacon-smoke.sh — fail the build if the endpoint accepts a bad payload
set -euo pipefail
ENDPOINT="https://staging-rum.example.com/v1/beacon"
# Valid beacon must return 204
code=$(curl -s -o /dev/null -w '%{http_code}' -X POST "$ENDPOINT" \
-H 'Content-Type: application/json' -H 'Origin: https://www.example.com' \
--data '{"id":"00000000-0000-4000-8000-000000000001","u":"/test","m":{"lcp":1200}}')
[ "$code" = "204" ] || { echo "valid beacon rejected: $code"; exit 1; }
# Oversized metric must be rejected (422), not accepted
code=$(curl -s -o /dev/null -w '%{http_code}' -X POST "$ENDPOINT" \
-H 'Content-Type: application/json' -H 'Origin: https://www.example.com' \
--data '{"id":"00000000-0000-4000-8000-000000000002","u":"/test","m":{"lcp":999999}}')
[ "$code" = "422" ] || { echo "validation gate weakened: got $code"; exit 1; }
echo "beacon contract gate passed"
FAQ
Should I use sendBeacon or fetch keepalive for RUM?
Use navigator.sendBeacon() as the default — it is the transport browsers actively keep alive through page unload, and a JSON Blob body stays preflight-free. Fall back to fetch() with keepalive: true only when sendBeacon returns false (overflow) or you need a custom header like an auth token or traceparent.
Why return 204 No Content from the ingestion endpoint?
Because sendBeacon discards the response entirely, there is nothing useful to send back. A 204 with an empty body is the smallest valid acknowledgement, signals success without wasting bytes, and keeps the contract honest: the browser is not waiting on a payload it cannot read.
How large can a beacon payload be?
Chromium enforces roughly a 64 KB ceiling on the sendBeacon body and returns false on overflow; keepalive fetch shares a per-page keepalive byte quota and rejects with QuotaExceededError. Keep envelopes flat and short-keyed, and prune optional attribution fields before they push you near the limit.
How do I prevent duplicate beacons from inflating my data?
Generate a v4 UUID id per page view on the client and reuse it across any retry, then deduplicate by that id at the endpoint with a short-TTL key store before enqueuing the write. The same envelope arriving twice collapses to one stored row, making the write idempotent.
Where do dropped beacons usually come from?
Most loss is on page unload — a metric finalized too late to ship, a Safari tab that under-fires pagehide, or a backgrounded tab that never runs the flush. Flushing on both pagehide and visibilitychange → hidden, and tracking the beacon-to-pageview ratio per browser and device tier, surfaces the gap.
Related
- RUM Ingestion Endpoint on Cloudflare Workers — production-hardened edge endpoint with KV rate limits and dedupe.
- Designing a BigQuery Schema for RUM Events — managed-warehouse table layout for beacon events.
- Reducing Beacon Payload Size for Mobile — field pruning and encoding to stay under the size ceiling.
- RUM Data Sampling Strategies — keep p75 stable while sampling beacon volume down.
- Web Vitals API Implementation — the PerformanceObserver and web-vitals layer that produces the values you ship.