Self-Hosted Beacon Collection

A self-hosted beacon path is the part of a Real-User Monitoring stack that decides whether a measured metric ever becomes a stored row. Everything upstream — the observers that capture Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift — is wasted if the beacon is dropped on unload, rejected at the edge, or silently deduplicated away. As established in RUM Architecture, Tooling & Self-Hosting, owning this layer buys you schema control, data retention you set, and cost that scales with rows rather than seats; the price is that delivery reliability, validation, back-pressure, and idempotency are now your problem. This page covers the full round trip: how the browser batches and sends, what the ingestion endpoint must validate and answer, how events land in columnar storage, and how to debug the gaps.

Self-hosted beacon ingestion flow The browser batches vitals and flushes on pagehide via sendBeacon to an edge endpoint that validates, rate-limits, dedupes by id, answers 204, and writes to columnar storage. Browser batcher web-vitals + PO queue events flush on pagehide sendBeacon Edge endpoint validate + auth rate-limit + CORS 204 No Content dedupe id Columnar store ClickHouse / BigQuery p75 aggregation Back-pressure queue + shed load never block 204 Lossy by design: a dropped beacon is acceptable, a blocked unload is not.
Beacons are batched in the browser, validated and rate-limited at the edge, deduplicated by id, then written to columnar storage and aggregated at p75.

Transport: sendBeacon vs fetch keepalive

There are exactly two transports that survive a page being torn down, and choosing between them is the first design decision. navigator.sendBeacon() queues a POST in the browser’s network stack and returns synchronously; the request completes even after the document is gone. It is the correct default because it is the one transport the browser actively keeps alive through unload without you holding the event loop. Its limits are real: it sends only POST, you cannot set custom request headers (the Content-Type is inferred from the Blob type), you cannot read the response, and the queued body is capped — Chromium enforces roughly a 64 KB ceiling and returns false if the buffer is full or the payload is too large.

fetch() with keepalive: true is the escape hatch when you need a custom header (an auth token, a traceparent) or want to attempt delivery from a non-unload context. It shares the same keepalive byte budget as sendBeacon — the spec caps total in-flight keepalive bodies per page, so a large payload can throw QuotaExceededError. The robust pattern is sendBeacon first, fall back to keepalive fetch only when sendBeacon returns false or you genuinely need headers. These transports are the delivery half of the web-vitals API implementation story, where PerformanceObserver and the web-vitals library produce the values you are shipping.

Capability navigator.sendBeacon() fetch(..., {keepalive:true})
Survives pagehide/unload Yes (primary use) Yes, within keepalive budget
Custom request headers No Yes
Response readable No (fire-and-forget) Yes
Methods POST only Any
Size ceiling ~64 KB body Shared keepalive quota
Return on overflow false synchronously rejects QuotaExceededError
Recommended role Default transport Header/auth fallback

Payload schema and batching

Beacons are emitted at unpredictable moments — INP and CLS are only final at page lifecycle end, while LCP can settle earlier. Rather than firing one request per metric, accumulate metrics in an in-memory queue keyed by page view and flush once. A flat, short-keyed JSON envelope keeps you well under the 64 KB ceiling and maps cleanly onto a columnar row. Each beacon carries a client-generated id (a v4 UUID) so the server can deduplicate retries; the same page view, if it manages to flush twice, produces the same id and collapses to one row.

Field Type Meaning
id UUID v4 Idempotency key for dedupe
sid string Session id (rotating, non-PII)
vid string Page-view id
u string URL path (no query/fragment)
ct enum effectiveType network class
dt enum Device tier bucket
cc string Country (resolved at edge)
ts int Client epoch ms at flush
m object Metric map: lcp, inp, cls

Client batcher implementation

The batcher below subscribes to metric callbacks, queues them, and flushes exactly once on the first pagehide (or visibilitychange to hidden, which Safari fires more reliably). It guards against double-send with a sent flag while still keeping the id stable so a server sees an identical envelope if a retry slips through.

// rum-batcher.js — accumulate vitals, flush once on page lifecycle end
const ENDPOINT = 'https://rum.example.com/v1/beacon';

function makeBatcher() {
  const payload = {
    id: crypto.randomUUID(),
    sid: getSessionId(),         // rotating, hashed, no PII
    vid: crypto.randomUUID(),
    u: location.pathname,
    ct: navigator.connection?.effectiveType ?? 'unknown',
    dt: deviceTier(),
    ts: 0,
    m: {}
  };
  let sent = false;

  function record(metric) {
    // metric === { name: 'LCP'|'INP'|'CLS', value: number }
    payload.m[metric.name.toLowerCase()] = Math.round(
      metric.name === 'CLS' ? metric.value * 1000 : metric.value
    );
  }

  function flush() {
    if (sent) return;
    if (Object.keys(payload.m).length === 0) return;
    sent = true;
    payload.ts = Date.now();
    const body = JSON.stringify(payload);
    const blob = new Blob([body], { type: 'application/json' });

    const ok = navigator.sendBeacon && navigator.sendBeacon(ENDPOINT, blob);
    if (!ok) {
      // sendBeacon refused (too large / buffer full) — keepalive fallback
      fetch(ENDPOINT, {
        method: 'POST',
        body,
        keepalive: true,
        headers: { 'Content-Type': 'application/json' }
      }).catch(() => { /* lossy by design */ });
    }
  }

  // pagehide is the reliable lifecycle hook; visibilitychange covers Safari
  addEventListener('pagehide', flush, { capture: true });
  addEventListener('visibilitychange', () => {
    if (document.visibilityState === 'hidden') flush();
  }, { capture: true });

  return { record };
}

function deviceTier() {
  const mem = navigator.deviceMemory ?? 4;
  const cores = navigator.hardwareConcurrency ?? 4;
  if (mem <= 2 || cores <= 2) return 'low';
  if (mem <= 4 || cores <= 4) return 'mid';
  return 'high';
}

// Wire it to the web-vitals library
import { onLCP, onINP, onCLS } from 'web-vitals';
const batcher = makeBatcher();
onLCP(batcher.record);
onINP(batcher.record);
onCLS(batcher.record);

Two details matter. CLS is multiplied by 1000 and rounded so it travels as a small integer rather than a float, which keeps the payload tight and the column an integer. The pagehide and visibilitychange listeners are both registered because no single event fires on every browser-and-platform combination — bfcache restores, tab discards, and iOS background transitions each miss one or the other.

Ingestion endpoint design

The endpoint has one hard rule: it answers fast and never lets storage latency leak back to the browser. It validates the envelope, authenticates and rate-limits the caller, applies CORS, enqueues the event for asynchronous writing, and returns 204 No Content with an empty body. 204 is correct precisely because sendBeacon discards the response — there is nothing to send back, and a 200 with a JSON body just wastes bytes. The handler below is a self-contained Cloudflare Workers fetch handler; the same logic ports to any runtime. For the production hardening of this endpoint — KV-backed rate limits, secret rotation, and Durable Object dedupe — see building a RUM ingestion endpoint on Cloudflare Workers.

// worker.js — minimal validated beacon ingestion endpoint
const ALLOWED_ORIGINS = new Set(['https://www.example.com', 'https://app.example.com']);
const MAX_BODY = 64 * 1024; // match sendBeacon ceiling

function corsHeaders(origin) {
  const allow = ALLOWED_ORIGINS.has(origin) ? origin : 'null';
  return {
    'Access-Control-Allow-Origin': allow,
    'Access-Control-Allow-Methods': 'POST, OPTIONS',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '86400'
  };
}

function validate(p) {
  if (typeof p !== 'object' || p === null) return 'not an object';
  if (typeof p.id !== 'string' || p.id.length !== 36) return 'bad id';
  if (typeof p.u !== 'string' || p.u.length > 512) return 'bad url';
  if (typeof p.m !== 'object' || p.m === null) return 'bad metrics';
  for (const [k, v] of Object.entries(p.m)) {
    if (!['lcp', 'inp', 'cls'].includes(k)) return `unknown metric ${k}`;
    if (!Number.isFinite(v) || v < 0 || v > 600000) return `bad ${k}`;
  }
  return null;
}

export default {
  async fetch(request, env, ctx) {
    const origin = request.headers.get('Origin') ?? '';
    const cors = corsHeaders(origin);

    if (request.method === 'OPTIONS') {
      return new Response(null, { status: 204, headers: cors }); // CORS preflight
    }
    if (request.method !== 'POST') {
      return new Response(null, { status: 405, headers: cors });
    }

    // Rate-limit per client IP via KV counter (best-effort, fail-open)
    const ip = request.headers.get('CF-Connecting-IP') ?? '0.0.0.0';
    const rlKey = `rl:${ip}:${Math.floor(Date.now() / 1000 / 10)}`; // 10s bucket
    const count = parseInt((await env.RL.get(rlKey)) ?? '0', 10);
    if (count > 100) return new Response(null, { status: 429, headers: cors });
    ctx.waitUntil(env.RL.put(rlKey, String(count + 1), { expirationTtl: 30 }));

    const len = parseInt(request.headers.get('Content-Length') ?? '0', 10);
    if (len > MAX_BODY) return new Response(null, { status: 413, headers: cors });

    let payload;
    try { payload = await request.json(); }
    catch { return new Response(null, { status: 400, headers: cors }); }

    const err = validate(payload);
    if (err) return new Response(null, { status: 422, headers: cors });

    // Dedupe by id, then enqueue async write — never block the response
    payload.cc = request.cf?.country ?? 'XX';
    ctx.waitUntil((async () => {
      const seen = await env.DEDUPE.get(payload.id);
      if (seen) return;                                  // idempotent drop
      await env.DEDUPE.put(payload.id, '1', { expirationTtl: 3600 });
      await env.QUEUE.send(payload);                     // back-pressure buffer
    })());

    return new Response(null, { status: 204, headers: cors }); // empty, fast
  }
};

The response returns before the queue write completes — ctx.waitUntil() keeps the async work alive without holding the 204. That ordering is the back-pressure boundary: a slow or saturated downstream queue can never stall the browser, and if the queue itself sheds load, the only casualty is a dropped beacon, which the design already tolerates.

Writing to columnar storage

The queue consumer flushes batches into a columnar store where high-cardinality dimensions (URL, country, connection type) and percentile aggregation are cheap. ClickHouse is the common self-hosted choice; BigQuery is the managed equivalent. The schema below keeps id as the dedupe key and stores CLS as the integer it traveled as. A correct p75 needs an aggregating engine — a plain SummingMergeTree cannot sum quantiles — so percentile state is computed with quantileState and merged at read time. The full pipeline build is covered in setting up a self-hosted RUM pipeline with ClickHouse, and the managed-warehouse layout in designing a BigQuery schema for RUM events.

-- ClickHouse raw beacon table; id carries through for downstream dedupe checks
CREATE TABLE rum_events_raw
(
  `id`              UUID,
  `ts`              DateTime64(3),
  `session_id`      String,
  `view_id`         String,
  `url_path`        String,
  `country_code`    LowCardinality(String),
  `connection_type` LowCardinality(String),
  `device_tier`     LowCardinality(String),
  `lcp_ms`          UInt32 DEFAULT 0,
  `inp_ms`          UInt32 DEFAULT 0,
  `cls_x1000`       UInt32 DEFAULT 0
)
ENGINE = MergeTree()
ORDER BY (ts, url_path, country_code)
TTL toDateTime(ts) + INTERVAL 90 DAY;

-- AggregatingMergeTree MV: quantileState is summable on merge
CREATE MATERIALIZED VIEW rum_cwv_p75_mv
ENGINE = AggregatingMergeTree()
ORDER BY (date, url_path, connection_type)
AS SELECT
  toDate(ts)                       AS date,
  url_path,
  connection_type,
  quantileState(0.75)(lcp_ms)      AS p75_lcp_state,
  quantileState(0.75)(inp_ms)      AS p75_inp_state,
  quantileState(0.75)(cls_x1000)   AS p75_cls_state,
  count()                          AS event_count
FROM rum_events_raw
GROUP BY date, url_path, connection_type;

-- Read p75 (CLS divided back to its 0–1 scale)
SELECT
  date,
  url_path,
  quantileMerge(0.75)(p75_lcp_state)            AS p75_lcp_ms,
  quantileMerge(0.75)(p75_inp_state)            AS p75_inp_ms,
  quantileMerge(0.75)(p75_cls_state) / 1000     AS p75_cls,
  sum(event_count)                              AS samples
FROM rum_cwv_p75_mv
WHERE date >= today() - 7
GROUP BY date, url_path
ORDER BY date DESC;

Threshold configuration

The endpoint and the dashboard built on top of it both reference the same Good / Needs Improvement / Poor bands. Bake these thresholds into the consumer so a regression query and an alert agree on what “Poor” means. CLS is shown on its native 0–1 scale; remember the column stores it multiplied by 1000.

Metric Good Needs Improvement Poor Engineering action at p75
LCP ≤ 2.5 s ≤ 4.0 s > 4.0 s Alert if p75 crosses 2.5 s for a route
INP ≤ 200 ms ≤ 500 ms > 500 ms Capture interaction attribution on Poor
CLS ≤ 0.1 ≤ 0.25 > 0.25 Segment by viewport before paging
FCP ≤ 1.8 s ≤ 3.0 s > 3.0 s Diagnostic for LCP regressions
TTFB ≤ 800 ms ≤ 1.8 s > 1.8 s Split server vs network at the edge

Debugging workflow

When metrics go missing or look wrong, the failure is almost always in transport or validation, not in measurement. Work the path in order:

  1. Identify the gap. Compare beacon volume against page-view volume from an independent counter (server logs, CDN analytics). A delivery hole shows up as a ratio below ~0.95 that correlates with a browser, route, or region.
  2. Trace the request. Open DevTools Network, filter to the beacon endpoint, and confirm the request appears under the “Other”/beacon initiator on pagehide. No request means the lifecycle listener never fired or sendBeacon returned false.
  3. Correlate the response. Check the status: a 422 is a schema mismatch, 413 is an oversized payload, 429 is rate-limit shedding, and a CORS error in the console with no status is a failed preflight.
  4. Validate in lab. Reproduce with a scripted unload (page.close() in Playwright) and assert the collector received a row with the expected id and metric fields.
  5. Deploy the fix. Ship the transport or validation change behind the same flag as the batcher so client and server move together.
  6. Monitor the delta. Watch the beacon-to-pageview ratio and the per-status histogram for 24–48 hours; a fix should lift the ratio without inflating 422/429.

Field-data segmentation

Aggregate delivery rates and metric percentiles are deceptive because beacon loss is not uniform — it concentrates exactly where measurement matters most. Segment every health and metric query along the dimensions the schema already carries:

  • Device tier. Low-tier devices both perform worse and drop more beacons (background suspension, memory pressure killing the page before flush). A dt = 'low' slice with a low delivery ratio is hiding your worst INP.
  • Network class. ct of slow-2g/2g correlates with timed-out keepalive fetches; loss here biases your distribution optimistic.
  • Geography. A regional delivery drop usually means a CORS or edge-routing problem at one PoP, not a real performance change — verify the ratio before believing the metric.
  • Browser/platform. A Safari-only gap points at a pagehide versus visibilitychange issue; an Android-WebView gap often points at sendBeacon being unavailable in an embedded context.

The right sampling rate to read these segments confidently — and how to keep p75 stable as you down-sample — is the subject of RUM data sampling strategies.

Failure modes and gotchas

  • Beacon dropped on unload. The classic loss: a metric finalized late never ships because the page is already gone. Mitigation is flushing on pagehide/visibilitychange (both, as shown) rather than the deprecated unload, which blocks bfcache and fires unreliably on mobile.
  • CORS preflight failures. Sending application/json with sendBeacon keeps the request “simple” and preflight-free; but a keepalive fetch that adds a custom header (traceparent, auth) triggers an OPTIONS preflight. If the endpoint does not answer OPTIONS with the matching Access-Control-Allow-Headers, the beacon silently fails. The handler above returns 204 to preflight explicitly.
  • sendBeacon size limit. Chromium caps the queued body around 64 KB and returns false on overflow. Keep envelopes flat and short-keyed; when payloads still grow (extra attribution data), apply the techniques in reducing beacon payload size for mobile networks before they hit the ceiling.
  • Safari quirks. Safari historically under-fires pagehide and has had intermittent PerformanceObserver gaps for some entry types; the visibilitychange → hidden listener is the reliable flush trigger there. Test bfcache restore explicitly, because a restored page must arm a fresh batcher.
  • Background-tab suspension. A discarded or frozen tab may never run your flush. Treat lifecycle freeze as a flush trigger too if you instrument the Page Lifecycle API, and accept that some loss is irreducible.
  • Double counting on retry. A keepalive fetch that the browser retries, or a user who navigates back into bfcache and re-flushes, can deliver the same envelope twice. The stable client id plus server-side dedupe (the DEDUPE lookup above) makes the write idempotent.

CI/CD gating

Beacon collection regresses silently, so gate it in the pipeline rather than discovering loss in production dashboards. Add a Playwright job that loads a test page, drives an interaction, closes the page, and asserts the collector received exactly one row with a valid id and the expected metric keys — this catches a broken transport, a schema-breaking field rename, or a CORS misconfiguration before merge. Run a contract test that POSTs known-bad payloads (oversized, missing id, out-of-range metric) and asserts the endpoint answers 413/422 rather than 204, so validation cannot silently weaken. Finally, fail the build if the synthetic beacon-to-pageview ratio in a smoke environment drops below threshold.

# ci-beacon-smoke.sh — fail the build if the endpoint accepts a bad payload
set -euo pipefail
ENDPOINT="https://staging-rum.example.com/v1/beacon"

# Valid beacon must return 204
code=$(curl -s -o /dev/null -w '%{http_code}' -X POST "$ENDPOINT" \
  -H 'Content-Type: application/json' -H 'Origin: https://www.example.com' \
  --data '{"id":"00000000-0000-4000-8000-000000000001","u":"/test","m":{"lcp":1200}}')
[ "$code" = "204" ] || { echo "valid beacon rejected: $code"; exit 1; }

# Oversized metric must be rejected (422), not accepted
code=$(curl -s -o /dev/null -w '%{http_code}' -X POST "$ENDPOINT" \
  -H 'Content-Type: application/json' -H 'Origin: https://www.example.com' \
  --data '{"id":"00000000-0000-4000-8000-000000000002","u":"/test","m":{"lcp":999999}}')
[ "$code" = "422" ] || { echo "validation gate weakened: got $code"; exit 1; }

echo "beacon contract gate passed"

FAQ

Should I use sendBeacon or fetch keepalive for RUM?

Use navigator.sendBeacon() as the default — it is the transport browsers actively keep alive through page unload, and a JSON Blob body stays preflight-free. Fall back to fetch() with keepalive: true only when sendBeacon returns false (overflow) or you need a custom header like an auth token or traceparent.

Why return 204 No Content from the ingestion endpoint?

Because sendBeacon discards the response entirely, there is nothing useful to send back. A 204 with an empty body is the smallest valid acknowledgement, signals success without wasting bytes, and keeps the contract honest: the browser is not waiting on a payload it cannot read.

How large can a beacon payload be?

Chromium enforces roughly a 64 KB ceiling on the sendBeacon body and returns false on overflow; keepalive fetch shares a per-page keepalive byte quota and rejects with QuotaExceededError. Keep envelopes flat and short-keyed, and prune optional attribution fields before they push you near the limit.

How do I prevent duplicate beacons from inflating my data?

Generate a v4 UUID id per page view on the client and reuse it across any retry, then deduplicate by that id at the endpoint with a short-TTL key store before enqueuing the write. The same envelope arriving twice collapses to one stored row, making the write idempotent.

Where do dropped beacons usually come from?

Most loss is on page unload — a metric finalized too late to ship, a Safari tab that under-fires pagehide, or a backgrounded tab that never runs the flush. Flushing on both pagehide and visibilitychange → hidden, and tracking the beacon-to-pageview ratio per browser and device tier, surfaces the gap.