Designing a BigQuery Schema for RUM Events
You are landing millions of Core Web Vitals beacons a day and you want them in BigQuery so analysts can answer “what is our p75 LCP for /checkout on mobile last week?” without standing up servers — but a naive SELECT over an unpartitioned table scans the whole history and bills you for every byte. This page designs a schema that makes those queries cheap: DAY time partitioning so a date filter prunes everything outside the window, clustering on metric_name and route so the dominant filters read only the relevant blocks, typed columns for the hot fields plus one JSON column for variable attribution detail, and APPROX_QUANTILES to compute p75 without sorting the universe. It is the warehouse alternative to a row-store collector and sits directly under self-hosted beacon collection; if you have not yet shipped a collector that emits one row per metric, read that first and return here once beacons are landing.
The expensive mistake in BigQuery is not schema design in the relational sense — it is letting queries scan partitions and columns they do not need. Partitioning controls which days a query reads, clustering controls which blocks within a day it reads, and column typing controls how many bytes per row a projection touches. Get those three right and a dashboard query over a year of data scans megabytes, not terabytes.
metric_name/route predicate prunes clustered blocks, so on-demand billing charges for a tiny slice of the table. See Grafana dashboards for web performance for the read side.Prerequisites
Before you run any DDL below, confirm the following are in place:
- A Google Cloud project with the BigQuery API enabled, the
bqCLI authenticated (gcloud auth login), and a dataset to own these tables. Pick a single region (for exampleUSorEU) and keep the dataset, table, and any load jobs in it — cross-region queries are not free and add latency. - A collector emitting one row per metric per page view, captured with the web-vitals library and PerformanceObserver and flushed on
visibilitychange. Modelling one row per metric (rather than one wide row per page view) keepsmetric_nameas a clean clustering key and lets each metric carry its own attribution payload. - A decision on sampling. BigQuery on-demand billing is per byte scanned, so for high-volume sites apply deterministic session sampling that preserves the p75 aggregation at the collector. Sampling shrinks both storage and every scan, and
APPROX_QUANTILESstays unbiased as long as sampling is per-session, not per-metric. - A loader path: either the streaming Storage Write API for near-real-time rows, or batched newline-delimited JSON / Parquet loads from Cloud Storage for cheaper bulk ingestion. We cover both below.
The beacon contract this schema assumes:
| Field | BigQuery type | Source | Notes |
|---|---|---|---|
event_time |
TIMESTAMP |
server clock | partition column; stamped at the endpoint, never the client clock |
metric_name |
STRING |
web-vitals | one of LCP / INP / CLS / FCP / TTFB; a clustering key |
value |
FLOAT64 |
web-vitals | ms for timing metrics, unitless for CLS |
route |
STRING |
normalized path | template route (/product/:id), not raw URL; a clustering key |
device_type |
STRING |
client hints | mobile / tablet / desktop |
country |
STRING |
edge geo header | derived server-side, never from the payload |
session_id |
STRING |
first-party token | for sampling and dedup, not long-term identity |
attribution |
JSON |
web-vitals attribution | variable, per-metric detail (element selector, LoAF, TTFB phases) |
How to design and build the schema
1. Create the partitioned, clustered table
Create the table with PARTITION BY on a DATE-truncated event_time and CLUSTER BY on the two columns your dashboards filter on most. Typed columns carry the hot fields; one JSON column absorbs the variable attribution detail so you never reshape the schema when web-vitals adds an attribution field.
CREATE TABLE rum.rum_events
(
event_time TIMESTAMP NOT NULL OPTIONS(description="Server-stamped event time"),
metric_name STRING NOT NULL OPTIONS(description="LCP|INP|CLS|FCP|TTFB"),
value FLOAT64 NOT NULL OPTIONS(description="ms for timings, unitless for CLS"),
route STRING OPTIONS(description="Normalized template route"),
device_type STRING,
country STRING,
session_id STRING,
rating STRING OPTIONS(description="good|needs-improvement|poor from web-vitals"),
navigation_type STRING OPTIONS(description="navigate|reload|back-forward|prerender"),
attribution JSON OPTIONS(description="Per-metric attribution detail")
)
PARTITION BY DATE(event_time)
CLUSTER BY metric_name, route
OPTIONS(
partition_expiration_days = 400,
require_partition_filter = TRUE,
description = "Raw RUM beacons, one row per metric per page view"
);
Why each choice matters:
PARTITION BY DATE(event_time)— BigQuery stores each calendar day as a separate physical partition. A query with a date predicate prunes every partition outside the range before it scans a byte, which is the single biggest lever on scan cost.DATE(one partition per day) gives ~400 partitions over a year, comfortably under the 4,000-partition limit and well-suited to RUM’s daily-trend reads.CLUSTER BY metric_name, route— within each daily partition, rows are sorted and co-located by these columns. A filter onmetric_name = 'LCP'(and oftenroute) lets BigQuery skip the blocks that hold other metrics. Order matters: put the most-filtered, lowest-cardinality column first.metric_namehas a handful of values and appears in nearly every query, so it leads;routerefines further.require_partition_filter = TRUE— this guardrail rejects any query that omits a partition filter, so an analyst’s accidental full-tableSELECTfails fast instead of scanning a year and billing for it. It is the cheapest insurance you can buy on a RUM table.partition_expiration_days = 400— partitions self-delete after ~13 months, capping storage and keeping the partition count bounded. Long-term trends live in a rollup table (step 5), so raw retention only needs to cover ad-hoc drill-downs.- Typed columns + one
JSONcolumn — the hot, always-queried fields are typed scalars so projections read minimal bytes and predicates stay sargable. The variable attribution payload (which differs per metric and evolves as the library adds fields) goes inattribution JSON, queried withJSON_VALUEonly when you actually drill in, so it never inflates the bytes scanned by a routine p75 query.
2. Stream rows for near-real-time ingestion
For operational dashboards that need rows within seconds, write through the Storage Write API. The default stream commits rows immediately and is the recommended path over the legacy tabledata.insertAll. Stamp event_time and geo server-side before the row is ever appended.
// stream.js — append RUM rows via the BigQuery Storage Write API (Node 18+)
import { v1 } from "@google-cloud/bigquery-storage";
const { BigQueryWriteClient } = v1;
const writer = new BigQueryWriteClient();
const PARENT =
"projects/my-proj/datasets/rum/tables/rum_events/streams/_default";
// Build one row per metric from a validated beacon.
export async function appendBeacon(beacon, geoCountry) {
const now = Date.now() * 1000; // BigQuery TIMESTAMP is micros since epoch
const rows = beacon.metrics.map((m) => ({
event_time: now,
metric_name: String(m.name).slice(0, 8),
value: Number(m.value) || 0,
route: String(beacon.route ?? "/").slice(0, 256),
device_type: String(beacon.device_type ?? "unknown").slice(0, 16),
country: geoCountry ?? "ZZ", // from the trusted edge header
session_id: String(beacon.session_id ?? "").slice(0, 64),
rating: String(m.rating ?? "").slice(0, 24),
navigation_type: String(m.navigationType ?? "navigate").slice(0, 16),
attribution: JSON.stringify(m.attribution ?? {}),
}));
const [stream] = await writer.appendRows();
stream.write({
writeStream: PARENT,
protoRows: { rows: { serializedRows: rows.map(encodeRow) } },
});
stream.end();
return new Promise((resolve, reject) => {
stream.on("data", resolve);
stream.on("error", reject);
});
}
Why streaming: rows in the _default stream are queryable within seconds, which is what an operational RUM view needs. Streaming costs more per GB ingested than batch loads, so reserve it for the slice of traffic that must be live — and route the rest through batch loads (step 3). Note attribution is serialized to a JSON string here because the row is JSON-typed in the schema; BigQuery parses it on ingest.
3. Batch-load the bulk of traffic cheaply
Loading is free (you pay only for storage of the result), so push the majority of beacons through batched loads from Cloud Storage. Buffer beacons into newline-delimited JSON (or Parquet) files in GCS, then load them with bq load.
# Load a day's buffered beacons from GCS. Loads do not incur query cost.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--time_partitioning_field=event_time \
--time_partitioning_type=DAY \
--clustering_fields=metric_name,route \
--ignore_unknown_values \
rum.rum_events \
'gs://my-rum-bucket/beacons/dt=2026-06-18/*.json'
Why batch: load jobs are not billed as scanned bytes, so moving bulk traffic off streaming and onto hourly or daily GCS loads can cut ingestion cost dramatically at high volume. The --time_partitioning_field and --clustering_fields flags must match the table definition so loaded rows land in the right partition and cluster layout. --ignore_unknown_values makes the load tolerant of extra collector fields without a schema migration.
4. Compute p75 with APPROX_QUANTILES and pruned partitions
Percentiles are the headline RUM stat, and computing an exact percentile means sorting every value — expensive at scale. APPROX_QUANTILES uses a bounded-memory sketch and is the right tool for p75 aggregation over field data. Always lead with a partition filter so the scan prunes to the date window, and a metric_name filter so clustering prunes blocks.
-- p75 per route for LCP, last 7 days. Partition + cluster filters keep the scan tiny.
SELECT
route,
COUNT(*) AS samples,
APPROX_QUANTILES(value, 100)[OFFSET(75)] AS p75_lcp_ms
FROM rum.rum_events
WHERE DATE(event_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
AND CURRENT_DATE()
AND metric_name = 'LCP'
AND device_type = 'mobile'
GROUP BY route
HAVING samples >= 100 -- suppress thin routes whose p75 is noise
ORDER BY p75_lcp_ms DESC;
Why this works:
APPROX_QUANTILES(value, 100)[OFFSET(75)]divides the distribution into 100 buckets and reads the boundary at the 75th — that is p75. The approximate algorithm runs in bounded memory and is dramatically cheaper thanPERCENTILE_CONT, which sorts the full input. For Core Web Vitals reporting the approximation error is well inside the noise of field data.WHERE DATE(event_time) BETWEEN ...is what makes the query cheap: it prunes every partition outside the 7-day window before any data is read. Because the table hasrequire_partition_filter = TRUE, omitting this clause is a hard error, not a silent full scan.metric_name = 'LCP'lets clustering skip the blocks holding INP, CLS, FCP, and TTFB within each partition. Adding aroutepredicate narrows it further still.HAVING samples >= 100keeps p75 meaningful — a route with a dozen samples produces a percentile that swings wildly day to day. The current Google field-data thresholds (LCP ≤ 2.5 s Good, INP ≤ 200 ms Good, CLS ≤ 0.1 Good) only mean something over a stable sample.
5. Roll up to a daily table to bound dashboard cost
A dashboard that re-reads raw rows on every refresh scans the same partitions repeatedly. Schedule a daily query that materializes per-day, per-dimension p75 into a small rollup table, and point dashboards at that instead.
-- Scheduled daily: append yesterday's p75 rollup. Reads one partition, writes a few thousand rows.
INSERT INTO rum.rum_daily_p75 (day, metric_name, route, device_type, samples, p75)
SELECT
DATE(event_time) AS day,
metric_name,
route,
device_type,
COUNT(*) AS samples,
APPROX_QUANTILES(value, 100)[OFFSET(75)] AS p75
FROM rum.rum_events
WHERE DATE(event_time) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
GROUP BY day, metric_name, route, device_type;
Why: the rollup query scans exactly one partition per run, and the resulting rum_daily_p75 table is a few thousand rows per day regardless of beacon volume. Dashboards then read that tiny table for trend charts, so a 90-day p75 line scans kilobytes. Partition rum_daily_p75 by day too, so even rollup reads prune. Build the rollup as an idempotent INSERT (delete-then-insert the target day on reruns) so a retried schedule never double-counts.
Verifying it works
Confirm the schema and its cost behaviour, in order:
- A streamed row is queryable within seconds:
SELECT COUNT(*), MAX(event_time) FROM rum.rum_events WHERE DATE(event_time) = CURRENT_DATE(); - Partition pruning is real. Run the p75 query with a dry run and read the bytes it would scan; a 7-day query over a year of data should report a small fraction of the table:
The reportedbq query --dry_run --use_legacy_sql=false \ "SELECT APPROX_QUANTILES(value,100)[OFFSET(75)] FROM rum.rum_events WHERE DATE(event_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND metric_name='LCP'"Estimated bytesshould be roughly 7/365 of a full-table scan, scaled down further by themetric_nameclustering. - The partition guardrail fires. A query without a date filter must error, not run:
SELECT COUNT(*) FROM rum.rum_events WHERE metric_name = 'LCP'; -- Expected: "Cannot query over table ... without a filter over column(s) 'event_time'" - Attribution survives the round trip. Drill into the JSON column for one slow metric:
SELECT JSON_VALUE(attribution, '$.element') AS lcp_element, value FROM rum.rum_events WHERE DATE(event_time) = CURRENT_DATE() AND metric_name = 'LCP' ORDER BY value DESC LIMIT 10; - The rollup matches raw. p75 from
rum_daily_p75for a day should equal the raw p75 for the same slice within approximation tolerance — if it diverges, the rollup’sGROUP BYdoes not match the dashboard’s.
A healthy table shows dashboard queries scanning megabytes, a p75 LCP line that tracks the 2.5 s Good boundary, p75 INP under 200 ms, and p75 CLS under 0.1.
Edge cases & gotchas
require_partition_filterand_PARTITIONTIME. If you partition on an ingestion-time pseudocolumn instead ofevent_time, your filters must reference_PARTITIONTIME(or_PARTITIONDATE), and a filter on a regularevent_timecolumn will not prune. Prefer an explicitevent_timecolumn so the partition key and the natural query predicate are the same thing.- Clustering only helps when the filter leads with the clustering key. A query that filters on
countrybut notmetric_namecannot use themetric_name, routeclustering and scans the whole partition. Choose cluster columns to match your dominant predicates, and reorder them if your access pattern shifts. - Streaming buffer and partition expiration. Recently streamed rows sit in a buffer that can briefly resist some operations and is not yet subject to partition expiration. Do not assume a row is immediately in its final partition for DML; for delete-then-insert rollups, run them on settled (prior-day) partitions only.
APPROX_QUANTILESwith too few rows. On a thin slice the approximate percentile is unstable. Always pair it with aCOUNT(*)and aHAVINGfloor, and never publish a per-route p75 built from a handful of samples.- JSON column cost.
attribution JSONis cheap as long as routine queries do not project it. The moment a dashboardSELECT *s the table, every byte of attribution is scanned and billed. Project explicit columns and reach into the JSON only on drill-down. - Zero and missing metrics. Beacons from bfcache restores can omit a metric or report
0. Model one row per captured metric so absent metrics are simply absent rows, rather than0rows that drag p75 below reality. If you must store zeros, filtervalue > 0in the percentile query. - Cross-region queries. Keep the dataset, GCS bucket, and load jobs in one region. A query that joins across regions, or a load from a bucket in another region, adds cost and latency that no partitioning will recover.
FAQ
Why partition on event_time instead of using _PARTITIONTIME?
An explicit event_time column makes the partition key the same value your queries naturally filter on, so a DATE(event_time) predicate prunes partitions directly and reads cleanly. _PARTITIONTIME reflects ingestion time, which drifts from event time and forces queries to filter on a pseudocolumn — fine for pure append-and-scan, but awkward when event time is what analysts actually ask about.
How does APPROX_QUANTILES give me p75?
APPROX_QUANTILES(value, 100) returns the 101 bucket boundaries of an approximate distribution; [OFFSET(75)] reads the boundary at the 75th percentile. It uses a bounded-memory sketch instead of sorting every value, so it is far cheaper than PERCENTILE_CONT at RUM scale, and its error is well within the noise of field data.
Will clustering by metric_name and route control my scan cost on its own?
No — clustering prunes blocks within the partitions a query already reads, so it amplifies partition pruning rather than replacing it. You still need a partition (date) filter to prune days; clustering then skips the metric and route blocks you do not need inside those days. Both together are what make a year-spanning RUM table cheap to query.
Related
- Self-Hosted Beacon Collection — the parent collector layer that produces the beacons this schema ingests.
- Building a RUM Ingestion Endpoint on Cloudflare Workers — the edge endpoint that stamps and forwards beacons into BigQuery.
- RUM Data Sampling Strategies — shrink both storage and scan cost while keeping p75 unbiased.