RUM Architecture, Tooling & Self-Hosting
Real-User Monitoring turns the performance of every real session into queryable data, and the architecture that carries it — from a browser beacon to a p75 number on a dashboard — is what decides whether that data is trustworthy, affordable, and yours. This reference describes the end-to-end self-hosted pipeline a senior team builds when it wants full control over capture, retention, and cost, instead of paying a vendor per event for a schema it cannot query.
The promise of RUM is simple: measure what real browsers experience, not what a clean lab profile predicts. The engineering reality is that doing this at scale means moving high-cardinality telemetry off the main thread, validating and stripping it at the edge before it touches storage, sampling it so query cost stays bounded, and aggregating it at p75 so a few catastrophic sessions do not drown the signal. Everything below treats field data as the source of truth and synthetic testing as a regression gate, the inverse of how most teams start. The metrics themselves — and how they are captured in the browser — are established in Core Web Vitals & Performance Metrics Fundamentals; this page is about the plumbing that carries them.
Field data vs synthetic: why the architecture exists
Synthetic testing — Lighthouse, WebPageTest, a CI runner — loads a page in a controlled, throttled environment and returns one deterministic number per run. It is excellent for catching render-blocking resources, regressions in a build artifact, and unoptimized assets before they ship. What it cannot do is model the distribution of real experience: cold caches, contended main threads, throttled radios, ad and consent scripts injected after the fact, and the slowest cohort of cheap Android devices that no synthetic profile reproduces faithfully. A page that scores 95 in the lab can still fail the field assessment Google uses for ranking, because ranking is decided on the 75th percentile of real Chrome users, not on a clean run.
That p75 assessment is the reason RUM architecture is worth building. Google’s spec classifies a page on the p75 of each metric: the largest contentful paint timing covered in LCP measurement and optimization, the responsiveness covered in INP tracking and debugging, and the stability covered in CLS reduction strategies. Load-phase signals — FCP and TTFB analysis — round out the diagnostic picture. The Chrome UX Report (CrUX) gives you these at p75 already, but only as a 28-day, origin-or-URL-grouped roll-up with no segmentation by your own dimensions. A self-hosted pipeline exists to recover everything CrUX hides: per-route, per-release, per-cohort p75 the day a regression ships, not 28 days later.
Core metrics reference
Every panel, alert, and SLO in the pipeline is anchored to the current Google thresholds. The p75 of each metric across your real traffic must fall in the “Good” band for the page to pass; the engineering action column is what the dashboard signal should trigger.
| Metric | Good (p75) | Needs improvement | Poor | Engineering action when p75 degrades |
|---|---|---|---|---|
| LCP | ≤ 2.5 s | ≤ 4.0 s | > 4.0 s | Audit largest element delivery: fetchpriority, preload, server response time |
| INP | ≤ 200 ms | ≤ 500 ms | > 500 ms | Break up long tasks, yield to the scheduler, defer non-critical handlers |
| CLS | ≤ 0.1 | ≤ 0.25 | > 0.25 | Reserve space with aspect-ratio; stabilize fonts, ads, and late-injected DOM |
| FCP | ≤ 1.8 s | ≤ 3.0 s | > 3.0 s | Reduce render-blocking CSS/JS; improve first-byte latency upstream |
| TTFB | ≤ 800 ms | ≤ 1.8 s | > 1.8 s | Add edge caching, tune origin, cut redirect chains |
These bands are the contract between the pipeline and the people reading it. Storing raw per-event values (never pre-bucketed) is what lets you recompute the p75 for any slice — a route, a country, a device tier — without re-instrumenting anything.
Self-Hosted Beacon Collection
The first architectural decision is how telemetry leaves the browser without harming the very metrics you are measuring. A naive collector that fires a synchronous request on unload, or serializes a large payload on the main thread, inflates INP and risks losing the beacon entirely when the tab is backgrounded. The production pattern is to capture each metric as the web-vitals library reports it, buffer the values, and flush a single compact payload with navigator.sendBeacon() on visibilitychange/pagehide. sendBeacon hands the request to the browser’s background queue so it survives navigation and tab closure, and it never blocks the unload sequence. A fetch with keepalive: true is the fallback for the rare case where the payload exceeds the sendBeacon size budget (browsers cap it around 64 KB).
What gets captured is exactly the set of entries described in the web-vitals API implementation reference: the metric value, its id, its rating, and — when you ship the attribution build — the element or event that caused it. The receiving side, the self-hosted beacon collection endpoint, owns validation, compression, and retention. Keeping both ends in-house is what removes third-party SDK weight from the critical path and gives you a schema you control. For payload shape and the trade-offs of compact encodings on slow networks, the beacon collection material goes deeper than this overview.
import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';
const ENDPOINT = '/rum';
const queue = [];
function record(metric) {
queue.push({
name: metric.name,
value: Math.round(metric.value),
rating: metric.rating,
id: metric.id,
navType: metric.navigationType,
});
}
onLCP(record);
onINP(record);
onCLS(record);
onFCP(record);
onTTFB(record);
function flush() {
if (queue.length === 0) return;
const body = JSON.stringify({
href: location.pathname,
conn: navigator.connection?.effectiveType ?? 'unknown',
dpr: window.devicePixelRatio,
ts: Date.now(),
metrics: queue.splice(0, queue.length),
});
const blob = new Blob([body], { type: 'application/json' });
if (!(navigator.sendBeacon && navigator.sendBeacon(ENDPOINT, blob))) {
fetch(ENDPOINT, { method: 'POST', body, keepalive: true }).catch(() => {});
}
}
// Flush once, at the last reliable moment in the page lifecycle.
addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') flush();
}, { once: false });
addEventListener('pagehide', flush, { once: true });
Note the single-flush-on-hide pattern: metrics like CLS and INP are only final at page hide, so flushing earlier reports stale values. The splice drains the queue atomically so a visibilitychange/pagehide double-fire cannot send duplicates.
Edge ingestion: validation, rate limiting, and PII stripping
The endpoint that receives beacons is the most security- and cost-sensitive component in the system, because it is public, unauthenticated, and writes directly toward storage. Three jobs happen here before any event is allowed to persist. First, validation: reject payloads that are not well-formed JSON, that carry metric values outside sane bounds (a negative LCP or a 600-second INP is a bug or an attack, not data), or that name metrics you do not collect. Second, rate limiting: a public POST endpoint will attract abuse, so cap per-IP request volume and per-payload size before parsing. Third, PII stripping: the edge is the only correct place to drop or truncate anything identifying — full IPs, query strings, referrers with tokens — so that personal data never reaches durable storage at all.
Running this at the edge (a Cloudflare Worker, Lambda@Edge, or an Nginx/Envoy tier in front of the writer) keeps latency low and absorbs bursts close to the user. The unified handler below is the canonical shape: it validates, clamps, strips, samples, and only then forwards a clean, flat row to the columnar writer. This single function is the boundary between “what the browser claimed” and “what the warehouse trusts.”
const ALLOWED = new Set(['LCP', 'INP', 'CLS', 'FCP', 'TTFB']);
const BOUNDS = { LCP: 60000, INP: 60000, CLS: 10, FCP: 60000, TTFB: 60000 };
const MAX_BODY = 16 * 1024; // 16 KB is generous for a vitals batch
const hits = new Map(); // ip -> { count, windowStart }
function rateLimited(ip, limit = 120, windowMs = 60000) {
const now = Date.now();
const e = hits.get(ip);
if (!e || now - e.windowStart > windowMs) {
hits.set(ip, { count: 1, windowStart: now });
return false;
}
e.count += 1;
return e.count > limit;
}
function sampled(id, rate) {
// Deterministic, session-stable: same id is always kept or always dropped.
let h = 2166136261;
for (let i = 0; i < id.length; i++) {
h ^= id.charCodeAt(i);
h = Math.imul(h, 16777619);
}
return (h >>> 0) % 100 < rate;
}
export async function ingest(request, write) {
if (request.method !== 'POST') return new Response('method', { status: 405 });
const ip = request.headers.get('cf-connecting-ip') ?? '0.0.0.0';
if (rateLimited(ip)) return new Response('slow down', { status: 429 });
const raw = await request.text();
if (raw.length > MAX_BODY) return new Response('too large', { status: 413 });
let body;
try { body = JSON.parse(raw); } catch { return new Response('bad json', { status: 400 }); }
if (!Array.isArray(body.metrics)) return new Response('schema', { status: 422 });
const rows = [];
for (const m of body.metrics) {
if (!ALLOWED.has(m.name)) continue;
const value = Number(m.value);
if (!Number.isFinite(value) || value < 0 || value > BOUNDS[m.name]) continue;
if (m.id && !sampled(m.id, 25)) continue; // keep ~25% of sessions
rows.push({
// PII stripping: no IP, no query string, route only.
route: String(body.href ?? '/').split('?')[0].slice(0, 256),
metric: m.name,
value,
rating: m.rating ?? 'unknown',
conn: body.conn ?? 'unknown',
ts: Number.isFinite(body.ts) ? body.ts : Date.now(),
});
}
if (rows.length) await write(rows);
return new Response(null, { status: 204 });
}
The implementations behind each concern — endpoint code on a worker, schema design, and payload reduction — live under self-hosted beacon collection; treat the handler above as the contract every backend must satisfy.
RUM Data Sampling Strategies
Raw RUM is a firehose: one busy property emits millions of events an hour, and storing every one means a warehouse bill that scales linearly with traffic while adding almost nothing to the precision of a p75. Sampling is how you decouple cost from traffic without distorting the metric. The non-negotiable rule is that sampling must be deterministic on a stable identifier — hash the session_id or metric id and keep events where the hash falls under the rate, exactly as the sampled() helper above does. Random per-event sampling shreds session continuity and makes funnel analysis impossible; deterministic session sampling keeps every event of a kept session, so journeys stay intact.
The choice of where and how to sample is the substance of RUM data sampling strategies. Head-based sampling decides at capture time and is cheap but blind to which sessions turned out interesting. Tail-based sampling buffers and decides after the fact — keeping every session that breached a threshold while down-sampling the healthy majority — which preserves the rare slow experiences that actually move the p75. A common production blend keeps 100% of error and threshold-breaching sessions and a deterministic 10–25% of the rest, then records the rate per row so aggregation can re-weight. Whatever rate you choose, store it: a p75 computed over a known sample is honest; a p75 over an unknown one is a guess.
p75 aggregation and columnar storage
Percentiles are not averageable. You cannot store an hourly p75 per shard and average the shards to get a daily p75 — the math does not hold — which is why the architecture keeps raw values and computes the quantile at query time over the slice you actually want. This is also why the storage layer is columnar. A columnar engine reads only the columns a query touches and computes quantiles over hundreds of millions of rows in well under a second, which a row-store cannot.
ClickHouse and BigQuery are the two dominant choices, and they represent the same build-vs-buy tension as the whole pillar. ClickHouse is self-hosted, gives near-zero marginal cost per query, and exposes exact and approximate quantile functions; BigQuery is serverless, removes the operations burden, and prices per byte scanned, which rewards aggressive partitioning. Both want a wide, denormalized table partitioned by date, with high-cardinality dimensions (route, geo, device) dictionary-encoded — LowCardinality in ClickHouse, clustering in BigQuery — to keep scans cheap. Schema design for each engine is covered under the beacon collection material (ClickHouse pipeline and BigQuery schema).
The canonical p75 query is the one every dashboard panel runs:
SELECT
toStartOfHour(ts) AS bucket,
route,
quantile(0.75)(value) AS p75_lcp,
count() AS samples
FROM rum_events
WHERE metric = 'LCP'
AND ts >= now() - INTERVAL 24 HOUR
GROUP BY bucket, route
HAVING samples >= 100 -- suppress noisy low-traffic routes
ORDER BY bucket, route;
The HAVING samples >= 100 guard matters as much as the quantile: a p75 over a handful of sessions is statistical noise, and surfacing it as a red panel trains engineers to ignore the dashboard. Enforce a minimum sample floor everywhere a percentile is shown.
OpenTelemetry for Web RUM
Hand-rolling beacon capture is correct for Core Web Vitals, but the moment you want to correlate a slow frontend interaction with the backend span that served it, you need a propagation standard rather than a bespoke payload. OpenTelemetry for web RUM provides exactly that: a vendor-neutral SDK that captures spans, metrics, and traces in the browser and exports them over OTLP to any compatible backend, with W3C Trace Context (traceparent, tracestate) carrying the traceId across the network boundary so a frontend span and its origin span share one trace.
The web SDK’s auto-instrumentation captures document load, user interaction, and fetch/XHR timing without manual wiring, and you can emit your web-vitals values as spans alongside them so a single backend holds both. The trade-off is weight and complexity: the OTel browser bundle is heavier than a bare sendBeacon collector, so many teams run both — a lean vitals beacon for the p75 SLO numbers and OTel spans for the trace-level debugging of specific slow sessions. Mapping browser entries to OTel semantic conventions (browser.name, http.url, network.connection.type) is what lets downstream tools parse the data without custom transforms; the configuration details are in the OpenTelemetry material.
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { DocumentLoadInstrumentation } from '@opentelemetry/instrumentation-document-load';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
const provider = new WebTracerProvider({
spanProcessors: [
new BatchSpanProcessor(new OTLPTraceExporter({ url: '/v1/traces' })),
],
});
provider.register();
registerInstrumentations({
instrumentations: [new DocumentLoadInstrumentation()],
});
Grafana Dashboards for Web Performance
Storage and aggregation are worthless until someone can see a regression the hour it ships. Grafana dashboards for web performance are where the p75 query becomes an operational instrument: a time-series panel per metric, thresholded at the Good/NI/Poor bands from the reference table, broken out by route and device tier, with deployment markers overlaid so a step-change in p75 LCP lines up with the release that caused it.
The dashboards that earn their keep do three things beyond plotting a line. They segment — a single global p75 hides that your mobile p75 INP is failing while desktop is fine, so every panel carries a device/connection breakdown. They annotate — release and config-change markers turn “INP got worse” into “INP got worse at 14:02, the moment release 4.7 deployed.” And they alert on the percentile that matters, firing when p75 crosses into the NI band for a sustained window rather than on a single noisy spike. The step-by-step panel build, alert wiring, and threshold colouring are in the Grafana material; the data contract is simply the p75 query above, parameterized by the dashboard’s route and time-range variables.
Privacy-Compliant Tracking
RUM captures navigation paths, interaction timing, and — unless you are deliberate — identifiers, which puts it squarely under GDPR, CCPA/CPRA, and ePrivacy. The architecture has to satisfy the law by construction, not by policy, and the edge endpoint is where that happens. Privacy-compliant tracking sets out the controls: anonymize IPs at ingress before anything is stored (truncate IPv4 to /24, IPv6 to /48), hash any session identifier with a rotating HMAC salt, minimize the payload to performance metrics only — never form values, never full query strings — and enforce consent-aware routing so a beacon only carries personal context when the user has opted in.
The most defensible RUM is cookieless and consent-respecting by default: performance timings are not personal data, so a pipeline that strips identifiers at the edge can keep measuring even before consent resolves, then enrich only consented sessions. Pair that with TTL-based partition drops in the warehouse so raw events expire automatically, and you have a system that is compliant because it cannot retain what it should not. The cookieless implementation and consent-mode integration are detailed in the privacy-compliant tracking material.
| Control | Where it runs | What it prevents |
|---|---|---|
| IP truncation (/24, /48) | Edge ingress | Precise geolocation of a user |
| HMAC-hashed session id, rotating salt | Edge | Re-identification across sessions |
| Payload minimization (metrics only) | Browser + edge validation | Accidental capture of form/query data |
| Consent-aware enrichment | Browser | Personal context without opt-in |
| TTL partition drops | Storage | Indefinite retention of raw events |
SpeedCurve vs Custom RUM and vendor comparison
Build-vs-buy is the decision the whole pillar leads to, and it is genuinely a trade-off, not a foregone conclusion. A managed product gives you instrumentation, storage, dashboards, and alerting on day one with no operations burden; a self-hosted stack gives you data sovereignty, unrestricted query flexibility, and near-zero marginal cost — at the price of the engineering you have just read about. The crossover is volume-driven: below a few million events a day, a vendor is almost always cheaper than the engineer-months a custom stack costs to build and run; above roughly 10–15 million daily events, self-hosting typically reaches cost parity within a year and saves money after.
The SpeedCurve vs custom RUM decision matrix frames this for the synthetic-plus-RUM case specifically, where SpeedCurve’s lab integration is hard to reproduce cheaply. For the broader landscape — Datadog RUM, New Relic Browser, and the operational reality of running your own — the RUM vendor comparison lays out ingestion throughput, query latency, retention, and compliance side by side.
| Dimension | Managed vendor | Self-hosted custom |
|---|---|---|
| Time to first dashboard | Hours | Engineer-weeks |
| Marginal cost per event | Per-event pricing | Near zero |
| Query flexibility | Proprietary UI | Full SQL over raw events |
| Data sovereignty | Vendor’s cloud | Yours |
| Operations burden | None | On-call for the pipeline |
| Crossover point | Cheaper below ~10M events/day | Cheaper above ~10–15M events/day |
Business impact: turning p75 into money
The pipeline justifies itself only if its numbers map to outcomes the business already tracks. The mechanism is correlation: join the p75 (or per-session rating) of each metric against conversion, bounce, or revenue from the same sessions, segment by metric band, and read off the gap. The pattern, and the SQL to overlay vitals on a funnel, is the substance of mapping Core Web Vitals to conversion rates.
A worked example makes the case concrete. Suppose a checkout route gets 2,000,000 sessions a month at a 3.0% conversion rate, and an average order value of $80. Your RUM shows sessions in the “Good” LCP band convert at 3.3% while “Needs improvement” sessions convert at 2.6%, and that 30% of traffic currently sits in the NI band. Moving that NI cohort into the Good band lifts those sessions from 2.6% to 3.3%:
NI sessions/month = 2,000,000 × 0.30 = 600,000
Conversion uplift = 3.3% − 2.6% = 0.7 pp
Extra conversions/month = 600,000 × 0.007 = 4,200
Extra revenue/month = 4,200 × $80 = $336,000
Annualized = $336,000 × 12 ≈ $4.03M
The number is only as honest as the correlation behind it, so always control for confounders — fast sessions skew toward fast networks and engaged users — by comparing within device and network segments rather than globally. But even a discounted version of this calculation is what converts a p75 panel from an engineering vanity metric into a funded roadmap line, and it is the reason the entire self-hosted pipeline is worth the operations cost.
FAQ
Why aggregate at p75 instead of the average or median?
Because Google assesses Core Web Vitals on the 75th percentile of real users, and because averages are wrecked by the slow tail — a handful of catastrophic sessions drag the mean far from typical experience. The median (p50) hides the slow quarter of traffic that p75 is specifically designed to surface. Storing raw values and computing the quantile at query time keeps every slice’s p75 honest; pre-bucketed or averaged percentiles are mathematically invalid because percentiles are not averageable across shards.
Should I use sendBeacon or fetch with keepalive?
Prefer sendBeacon as the primary path: it queues the request in the browser’s background and is the most reliable way to deliver telemetry during visibilitychange/pagehide without blocking unload. Fall back to fetch with keepalive: true only when the payload exceeds the sendBeacon size budget (around 64 KB) or the browser rejects the call. Flush once at page hide, since metrics like INP and CLS are only final then.
When does self-hosting actually beat a vendor on cost?
Below a few million events a day, a managed vendor is almost always cheaper once you price in the engineer-months a custom stack costs to build and operate. Above roughly 10–15 million daily events, a self-hosted columnar pipeline typically reaches cost parity within 12–18 months and saves money thereafter, while also giving you full SQL access and data sovereignty. The decision is volume- and capability-driven, not ideological.
Where in the pipeline should PII be removed?
At the edge ingestion endpoint, before anything is written to durable storage. Truncate IPs, drop query strings and token-bearing referrers, and hash session identifiers there, so personal data never reaches the warehouse at all. Removing it later — after it has been stored — is both harder to prove and a compliance liability; the only defensible place is the public boundary the beacon hits first.
Do I need OpenTelemetry if I already have a vitals beacon?
Not for the p75 SLO numbers — a lean sendBeacon collector handles those with far less weight. Add OpenTelemetry when you need to correlate a specific slow frontend interaction with the backend span that served it, using W3C Trace Context to stitch one trace across the network boundary. Many teams run both: the beacon for aggregate field metrics and OTel spans for trace-level debugging of individual sessions.
Related
- Self-Hosted Beacon Collection — sendBeacon capture, the ingestion endpoint, payload shape, and warehouse schema.
- OpenTelemetry for Web RUM — vendor-neutral spans and traces with W3C Trace Context propagation.
- RUM Data Sampling Strategies — deterministic, head- vs tail-based sampling that bounds cost without distorting p75.
- Grafana Dashboards for Web Performance — thresholded, segmented p75 panels with deployment annotations and alerts.
- Privacy-Compliant Tracking — cookieless, consent-aware capture with edge PII stripping and TTL retention.
- RUM Vendor Comparison — Datadog, New Relic, and self-hosted side by side on cost, query, and compliance.