Self-Hosted RUM Pipeline with ClickHouse
You have field metrics arriving from navigator.sendBeacon() and you need somewhere to put them that answers “what is our p75 LCP for mobile users in Germany over the last 7 days?” in milliseconds, not seconds — without paying per-event vendor pricing. This page walks the exact, end-to-end build of that pipeline on ClickHouse: a tiny ingestion endpoint, a MergeTree table sorted for time-series reads, async inserts to tame part fragmentation, and a materialized view that pre-aggregates p75 rollups so dashboards never scan raw rows. It is the storage-and-query layer that sits behind self-hosted beacon collection; if you have not yet shipped a collector, read that first and return here once beacons are landing.
The hard part is not the SQL — it is choosing an ORDER BY that matches your dashboard predicates, controlling write amplification so background merges keep up, and computing percentiles cheaply enough to serve interactive dashboards. We cover all three with real, runnable code.
MergeTree, and a materialized view continuously computes per-day p75 rollups. See Grafana dashboards for web performance for the read side.Prerequisites
Before you run any SQL below, confirm the following are in place:
- A ClickHouse server (24.x or newer) reachable over HTTP on port 8123, with a database to own these tables. A single node is fine for under ~50k beacons/second; shard only when one node’s merges fall behind.
- A working collector emitting one JSON object per page view, captured with the web-vitals library and PerformanceObserver and flushed on
visibilitychange. The fields this schema expects are listed in the table below. - A decision on sampling. If you are keeping 100% of traffic you can skip it, but for high-volume sites apply deterministic session sampling that preserves the p75 aggregation at the collector so the percentile math stays unbiased.
- Node 18+ (for the
fetch-based inserter) or a Cloudflare Workers runtime if your endpoint lives at the edge.
The beacon contract this pipeline assumes:
| Field | Type | Source | Notes |
|---|---|---|---|
timestamp |
DateTime64(3) |
server clock | set on the endpoint, never trust the client clock |
project_id |
LowCardinality(String) |
collector config | a handful of distinct values |
session_id |
String |
first-party token | used for sampling and stitching, not stored long-term |
page_path |
LowCardinality(String) |
location.pathname |
path only — strip query strings to bound cardinality |
device_tier |
LowCardinality(String) |
client hints | e.g. low / mid / high |
country |
LowCardinality(String) |
edge geo header | derived server-side, never from the payload |
lcp_ms |
UInt32 |
web-vitals | 0 when not captured |
inp_ms |
UInt32 |
web-vitals | 0 when not captured |
cls |
Float32 |
web-vitals | unitless |
How to build the pipeline
1. Create the raw events table
Start with the MergeTree table that receives every beacon. The ORDER BY is the single most important decision: ClickHouse stores rows physically in this order and uses it as a sparse primary index, so it must lead with the columns your dashboards filter on and end with timestamp for time-range scans.
CREATE TABLE rum_events
(
timestamp DateTime64(3),
project_id LowCardinality(String),
session_id String,
page_path LowCardinality(String),
device_tier LowCardinality(String),
country LowCardinality(String),
lcp_ms UInt32 CODEC(T64, ZSTD(1)),
inp_ms UInt32 CODEC(T64, ZSTD(1)),
cls Float32 CODEC(ZSTD(1))
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (project_id, device_tier, country, timestamp)
TTL toDateTime(timestamp) + INTERVAL 90 DAY DELETE
SETTINGS index_granularity = 8192;
Why each choice matters:
ORDER BY (project_id, device_tier, country, timestamp)— dashboards almost always filter by project, often by device class and geography, then scan a time window. Leading with the low-cardinality dimensions lets the sparse index skip whole granules before it ever touches a time range. Puttingtimestamplast keeps each(project, tier, country)slice contiguous and time-ordered.PARTITION BY toYYYYMM(timestamp)— monthly partitions keep the partition count low (avoid daily partitions, which produce too many parts) while still letting TTL and old-data drops operate at partition granularity.LowCardinality(String)— for categorical dimensions this dictionary-encodes values, cutting disk footprint 60–80% and making dimensional filtering far faster than plainString.CODEC(T64, ZSTD(1))—T64transposes integer bits so similar magnitudes compress together beforeZSTD; ideal for millisecond timings clustered in a narrow range.TTL ... DELETE— raw rows self-expire after 90 days. The rollup view (step 4) holds the long-term history, so raw retention only needs to cover ad-hoc drill-downs.
2. Write the ingestion endpoint
The endpoint’s job is small and strict: parse one JSON beacon, stamp the server timestamp and geo, reject anything malformed, and hand the row to ClickHouse. Critically, it inserts with async inserts enabled so ClickHouse — not your endpoint — does the batching. This is what lets you accept one HTTP request per beacon without melting the merge scheduler.
// ingest.js — minimal Node 18+ RUM ingestion endpoint
import { createServer } from "node:http";
const CH_URL = process.env.CLICKHOUSE_URL ?? "http://localhost:8123";
const CH_AUTH = "Basic " + Buffer.from(
`${process.env.CH_USER ?? "default"}:${process.env.CH_PASSWORD ?? ""}`
).toString("base64");
// async_insert lets ClickHouse buffer many single-row INSERTs into one part.
const INSERT_SQL =
"INSERT INTO rum_events FORMAT JSONEachRow " +
"SETTINGS async_insert=1, wait_for_async_insert=0";
function clamp(n, max) {
const v = Number(n);
return Number.isFinite(v) && v >= 0 && v <= max ? Math.round(v) : 0;
}
createServer((req, res) => {
if (req.method !== "POST" || !req.url.startsWith("/rum/collect")) {
res.writeHead(404).end();
return;
}
let body = "";
let size = 0;
req.on("data", (chunk) => {
size += chunk.length;
if (size > 64 * 1024) req.destroy(); // bound payload to 64KB
body += chunk;
});
req.on("end", async () => {
let b;
try { b = JSON.parse(body); } catch { res.writeHead(400).end(); return; }
const row = {
timestamp: new Date().toISOString().replace("T", " ").replace("Z", ""),
project_id: String(b.project_id ?? "unknown").slice(0, 64),
session_id: String(b.session_id ?? "").slice(0, 64),
page_path: String(b.page_path ?? "/").split("?")[0].slice(0, 256),
device_tier: String(b.device_tier ?? "unknown").slice(0, 16),
country: req.headers["cf-ipcountry"] ?? "ZZ", // geo from edge header
lcp_ms: clamp(b.lcp_ms, 120000),
inp_ms: clamp(b.inp_ms, 60000),
cls: Math.max(0, Math.min(Number(b.cls) || 0, 10)),
};
const r = await fetch(`${CH_URL}/?query=${encodeURIComponent(INSERT_SQL)}`, {
method: "POST",
headers: { Authorization: CH_AUTH, "Content-Type": "application/json" },
body: JSON.stringify(row) + "\n",
});
res.writeHead(r.ok ? 204 : 502).end();
});
}).listen(8787, () => console.log("RUM ingest on :8787"));
Why this shape:
async_insert=1, wait_for_async_insert=0— every beacon arrives as its own HTTP request, but ClickHouse coalesces them into one part flushed on a size/time threshold. Without this, one-row-per-INSERTtraffic creates thousands of tiny parts per minute and background merges never catch up (the “too many parts” failure mode).- Server-stamped
timestampandcountry— the client clock is untrustworthy and the payload must never carry geo; both are set at the trusted edge. - 64KB cap and field clamping — bound the payload to the
sendBeaconqueue limit and coerce every numeric field into a sane range so a malformed beacon can never poison an aggregate.
Deploying at the edge instead? The same logic ports directly to a Worker — see building a RUM ingestion endpoint on Cloudflare Workers for the runtime-specific batching with ctx.waitUntil.
3. Tune async-insert behaviour
Async inserts are governed by two thresholds; the first to trip flushes the buffer. Set them as profile settings so every inserting connection inherits them.
-- Flush a buffered batch at 1 MB of data OR after 1 second, whichever first.
ALTER USER default SETTINGS
async_insert_max_data_size = 1048576,
async_insert_busy_timeout_ms = 1000;
Why: a 1 second / 1 MB window keeps parts large enough that merges stay ahead, while bounding visibility lag to about a second — fast enough for an operational RUM dashboard. Raise the timeout if your traffic is bursty and you want fewer, larger parts; lower it if you need near-real-time rows and can tolerate more parts.
4. Add the p75 rollup materialized view
Scanning raw rows for every dashboard refresh wastes CPU. A materialized view computes per-day, per-dimension percentile states as rows insert, so dashboards read a table that is orders of magnitude smaller. Percentile state is stored with quantileTDigestState, which is mergeable across granules and partitions — exactly what an aggregating rollup needs.
CREATE TABLE rum_rollup_daily
(
day Date,
project_id LowCardinality(String),
device_tier LowCardinality(String),
country LowCardinality(String),
sessions AggregateFunction(count),
lcp_p75 AggregateFunction(quantileTDigest(0.75), UInt32),
inp_p75 AggregateFunction(quantileTDigest(0.75), UInt32),
cls_p75 AggregateFunction(quantileTDigest(0.75), Float32)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (project_id, device_tier, country, day);
CREATE MATERIALIZED VIEW rum_rollup_daily_mv TO rum_rollup_daily AS
SELECT
toDate(timestamp) AS day,
project_id,
device_tier,
country,
countState() AS sessions,
quantileTDigestState(0.75)(lcp_ms) AS lcp_p75,
quantileTDigestState(0.75)(inp_ms) AS inp_p75,
quantileTDigestState(0.75)(cls) AS cls_p75
FROM rum_events
WHERE lcp_ms > 0
GROUP BY day, project_id, device_tier, country;
Why this works:
quantileTDigestStatestores a compact t-digest sketch, not the raw values.AggregatingMergeTreemerges these sketches as parts compact, so the rollup stays a few thousand rows per day regardless of beacon volume. UsequantileTDigest(t-digest) overquantileExacthere because it is mergeable and memory-bounded; reservequantileExactfor one-off audits where you need the true percentile.WHERE lcp_ms > 0drops rows where LCP was never captured (bfcache restores, instant aborts) so they do not deflate the percentile.- The view only transforms rows inserted after it is created. Backfill history once with
INSERT INTO rum_rollup_daily SELECT ... FROM rum_eventsusing the same aggregation.
5. Query the rollup from a dashboard
Dashboards read the rollup and call the matching Merge-combinator function to finalize the sketch into a number. This scans the small aggregate table, not raw events.
SELECT
day,
countMerge(sessions) AS sessions,
round(quantileTDigestMerge(0.75)(lcp_p75)) AS p75_lcp_ms,
round(quantileTDigestMerge(0.75)(inp_p75)) AS p75_inp_ms,
round(quantileTDigestMerge(0.75)(cls_p75), 3) AS p75_cls
FROM rum_rollup_daily
WHERE project_id = 'prod-web'
AND device_tier = 'mid'
AND day >= today() - 28
GROUP BY day
ORDER BY day DESC;
This is the query you point a Grafana panel at. Because (project_id, device_tier, country, day) leads the rollup’s ORDER BY, the dimension filters hit the primary index directly and the scan touches only the relevant granules.
Verifying it works
Confirm the pipeline end-to-end, in order:
- Beacon lands. Fire a synthetic beacon and expect HTTP 204:
curl -s -o /dev/null -w "%{http_code}\n" -X POST \ -H "Content-Type: application/json" \ -d '{"project_id":"prod-web","session_id":"t1","page_path":"/home","device_tier":"mid","lcp_ms":1900,"inp_ms":120,"cls":0.04}' \ http://localhost:8787/rum/collect - Row is queryable after the async-insert window (≤ ~1 s):
SELECT count(), max(timestamp) FROM rum_events WHERE project_id = 'prod-web'; - Rollup populated. The materialized view should have a matching daily row:
SELECT day, countMerge(sessions) FROM rum_rollup_daily WHERE project_id = 'prod-web' GROUP BY day; - Merges are keeping up. Active part count per partition should stay low (tens, not hundreds):
SELECT partition, count() AS parts FROM system.parts WHERE table = 'rum_events' AND active GROUP BY partition ORDER BY parts DESC; - The index is used. Confirm dashboard queries prune granules instead of full-scanning:
TheEXPLAIN indexes = 1 SELECT quantileTDigestMerge(0.75)(lcp_p75) FROM rum_rollup_daily WHERE project_id = 'prod-web' AND day >= today() - 7;ReadFromMergeTreenode should showGranules: <selected>/<total>with selected far below total.
In a live dashboard, a healthy pipeline shows a p75 LCP line that tracks the 2.5 s Good boundary, p75 INP under 200 ms, and p75 CLS under 0.1 — the current Google field-data thresholds.
Edge cases & gotchas
- “Too many parts” under async inserts. If you still see this, your async-insert window is too small or
wait_for_async_insert=0is being overridden per query. Raiseasync_insert_busy_timeout_ms, and never wrap async inserts in a per-row loop that also forces a sync flush. - Zero-value metrics skew percentiles. Beacons from bfcache restores or aborted loads report
0for uncaptured metrics. The rollup’sWHERE lcp_ms > 0filter handles LCP, but apply the same guard per metric if you split them into separate views — otherwise zeros drag p75 below reality. LowCardinalityon truly high-cardinality columns backfires.page_pathisLowCardinalityonly because query strings are stripped. If you store full URLs with parameters the dictionary explodes and both inserts and reads slow down. Always normalize the path server-side.- Materialized views are insert-time only. Creating the view does not transform existing rows, and altering the view’s
SELECTdoes not re-process history — you must backfill manually. Test view logic on a sampled subset before pointing it at production inserts. - TTL deletes raw rows the rollup still implies. Raw
rum_eventsexpires at 90 days butrum_rollup_dailyhas no TTL by design. If you add one, make it strictly longer than any dashboard’s time window, or trend charts will silently truncate. - Clock trust. Never
ORDER BYor partition on a client-supplied timestamp. A single device with a wrong clock can scatter rows across partitions and wreck merge locality; always stamp time at the endpoint.
FAQ
Why a materialized view instead of querying raw events directly?
Interactive dashboards re-run their query on every refresh and time-range change. Scanning raw rows for percentiles each time burns CPU proportional to traffic. The materialized view pre-aggregates t-digest sketches at insert time, so reads scan a small rollup table whose size depends on dimension count, not beacon volume.
Should I use quantileTDigest or quantileExact for p75?
Use quantileTDigest for rollups and dashboards: its sketch is mergeable across parts and memory-bounded, which is what makes the AggregatingMergeTree rollup viable. Reserve quantileExact for occasional audits where you need the precise percentile and can afford to scan and hold all values in memory.
How do async inserts prevent the “too many parts” error?
Each beacon arrives as its own request, but with async_insert=1 ClickHouse buffers rows server-side and flushes them as one part when the size or time threshold trips. That turns thousands of tiny single-row parts into a few large ones, keeping background merges ahead of ingestion.
Related
- Self-Hosted Beacon Collection — the parent collector layer that produces the beacons this pipeline ingests.
- Designing a BigQuery Schema for RUM Events — the same modelling problem solved on a serverless columnar warehouse.
- Building a Core Web Vitals Grafana Dashboard — wiring the p75 rollup query into panels and alerts.