Privacy-Compliant Tracking
Real-user monitoring puts a measurement endpoint on every page load, which makes it one of the most pervasive collection surfaces a frontend operates — and therefore one regulators scrutinize most. This page, part of RUM Architecture, Tooling & Self-Hosting, shows how to keep field-data fidelity for Core Web Vitals while collecting no personal data: cookieless ephemeral session ids, payloads that carry zero PII, IP truncation at the edge, explicit consent handling, and a CI gate that fails the build if a payload regresses into collecting something it should not.
The constraint is not “anonymize after the fact.” Under GDPR a value that can re-identify a person — a precise IP, a high-entropy user-agent, a stable client id — is personal data the moment it is transmitted, not the moment it is stored. So the architecture pushes every reduction as close to the client as possible and strips whatever survives at the first edge hop, before the payload reaches durable storage. Two narrower implementations build on this foundation: GDPR-Compliant RUM Without Cookies covers the cookieless session model end to end, and Integrating Consent Mode with RUM Beacons covers wiring a CMP/TCF signal into the beacon path.
What “personal data” means for a beacon
Before any field is added to a payload, classify it. The distinction that matters is whether a value, alone or combined with others already in the row, can single out a natural person. A single LCP measurement cannot. An LCP measurement plus a precise IP, a full user-agent, a viewport size, a timezone, and a stable id can — that combination is a browser fingerprint with enough entropy to re-identify across requests. This is why fingerprinting risk is a property of the whole row, not any one column.
The legal basis for collecting performance telemetry without a consent banner is usually legitimate interest under GDPR Article 6(1)(f): you have a genuine operational need (keeping the site fast), the processing is minimal, and a reasonable user would not be surprised by it. That basis only holds while the data stays non-identifying. The instant you attach a cross-page identifier or an un-truncated IP, you lose the legitimate-interest argument and need consent. The architecture exists to keep you on the right side of that line by construction.
Field classification table
Classify every field you intend to send. The handling column is the rule the edge handler and the client enforce; if a field has no safe handling, it does not go in the payload.
| Field | PII risk | Handling |
|---|---|---|
| LCP / INP / CLS / FCP / TTFB value | None | Send raw; numeric metric only |
| Metric rating (good/NI/poor) | None | Send raw |
| Cookieless session id (per page load) | Low if ephemeral | Generate client-side, never persist, rotate per load |
| URL path | Medium | Strip query string and path segments that encode ids; keep route template |
| Referrer | Medium–High | Drop at edge; keep only same-origin/cross-origin boolean if needed |
| Full User-Agent string | High (entropy) | Reduce to browser family + major version + OS family |
| Client IP address | High | Truncate to /24 (IPv4) or /48 (IPv6) at edge, or drop |
| Viewport / screen dimensions | Medium (entropy) | Bucket to coarse device-tier classes, never raw px |
| Timezone / locale | Medium (entropy) | Bucket to region code; avoid combining with other entropy |
| Country / region | Low | Derive from truncated IP at edge, store 2-letter code only |
| Effective connection type | Low | Send raw enum (4g/3g/etc.) |
| DOM selector of INP target | Medium | Reduce to tag + first class, strip ids and text content |
The right column is also your audit checklist. Every field that ships should map to exactly one handling rule, and the CI gate later in this page enforces that the payload schema contains nothing outside this set.
Cookieless ephemeral session ids
A “session” in privacy-safe RUM is a single page lifetime, not a cross-page user journey. You generate a random id when the page loads, hold it in a module variable (never localStorage, never a cookie), use it to deduplicate the beacons from that one page, and discard it on unload. Because it never persists and never travels between navigations, it cannot link two page views to the same person and therefore is not an identifier in the regulatory sense.
// Ephemeral, per-page-load session id. Lives only in memory for this page's lifetime.
// crypto.randomUUID gives 122 bits of entropy with no device-derived input,
// so it carries zero information about the user or device.
const SESSION_ID = crypto.randomUUID();
// Page-load nonce lets the collector drop duplicate beacons (sendBeacon can fire
// from both visibilitychange and pagehide) without any cross-page correlation.
const PAGE_NONCE = crypto.randomUUID();
function basePayload() {
return {
sid: SESSION_ID, // discarded when this page unloads
nonce: PAGE_NONCE, // dedup only
ts: Date.now(),
// route template, not the live URL: strip query + numeric/id-looking segments
route: toRouteTemplate(location.pathname),
};
}
function toRouteTemplate(pathname) {
return pathname
.split('/')
.map((seg) => (/^[0-9a-f-]{6,}$/i.test(seg) || /^\d+$/.test(seg) ? ':id' : seg))
.join('/');
}
The toRouteTemplate step matters more than it looks: a path like /orders/8841/invoice collapses to /orders/:id/invoice, removing a value that, combined with a timestamp, could pinpoint a specific transaction and through it a specific person.
Capturing vitals without leaking identity
Collection uses the standard web-vitals library and PerformanceObserver buffering so that LCP, INP, CLS, and the FCP and TTFB loading metrics are captured against buffered entries. The privacy work happens in the mapping function that turns a metric object into a payload field — it copies the numeric value and rating and nothing else, and it reduces the INP attribution target to a low-entropy selector.
import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';
// Reduce a DOM element to tag + first class. No ids, no text, no nth-child paths,
// which keeps the selector from encoding user-specific content.
function safeSelector(el) {
if (!(el instanceof Element)) return 'unknown';
const tag = el.tagName.toLowerCase();
const cls = el.classList[0];
return cls ? `${tag}.${cls}` : tag;
}
const queue = [];
function record(metric) {
const field = {
name: metric.name, // 'LCP' | 'INP' | 'CLS' | 'FCP' | 'TTFB'
value: Math.round(metric.value),
rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'
};
if (metric.name === 'INP' && metric.attribution) {
field.target = safeSelector(metric.attribution.interactionTargetElement);
}
queue.push(field);
}
onLCP(record);
onINP(record);
onCLS(record);
onFCP(record);
onTTFB(record);
// Finalize on the page-lifecycle terminal events. visibilitychange covers the
// common case (tab hidden / backgrounded); pagehide is the safety net for bfcache
// and Safari, which historically does not fire a reliable final visibilitychange.
function finalize() {
if (queue.length === 0) return;
dispatch({ ...basePayload(), metrics: queue.splice(0) });
}
addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') finalize();
});
addEventListener('pagehide', finalize);
The dispatch function below is where the consent gate sits, so that no metric leaves the device until a decision is known.
Consent signals: CMP/TCF and Google Consent Mode
There are two consent worlds you may have to interoperate with. The IAB TCF v2 model exposes a __tcfapi function and emits a consent string that encodes per-purpose and per-vendor permissions. Google Consent Mode exposes a gtag('consent', ...) state with named signals such as analytics_storage. Performance RUM under legitimate interest may not strictly need either, but if your organization has decided to gate all telemetry behind consent — or if your RUM rides alongside analytics that does — you wire the beacon to wait for the relevant signal.
The hard part is the consent race on the first beacon. The TCF/CMP API is itself async and may resolve after your first vitals are ready (TTFB and FCP arrive early). If you fire immediately you risk sending before consent; if you block forever you lose the load entirely when the page is closed fast. The fix is to buffer locally and flush only once a consent decision resolves, with the lifecycle terminal events still able to drop the buffer if consent never came.
let consentState = 'pending'; // 'pending' | 'granted' | 'denied'
const pending = []; // payloads held until consent resolves
function resolveConsent(granted) {
consentState = granted ? 'granted' : 'denied';
if (consentState === 'granted') {
while (pending.length) sendBeaconPayload(pending.shift());
} else {
pending.length = 0; // denied: drop everything, send nothing
}
}
// IAB TCF v2: purpose 7 (measure ad performance) / 8 (measure content performance)
if (typeof window.__tcfapi === 'function') {
window.__tcfapi('addEventListener', 2, (data, success) => {
if (!success || data.eventStatus === 'cmpuishown') return;
const ok = data.gdprApplies === false ||
(data.purpose?.consents?.[8] === true);
resolveConsent(ok);
});
} else if (typeof window.gtag === 'function') {
// Google Consent Mode: treat analytics_storage as the gate for RUM.
window.gtag('consent', 'default', { analytics_storage: 'denied' });
// Your CMP calls gtag('consent','update',{...}); mirror it here:
window.addEventListener('consent-updated', (e) => {
resolveConsent(e.detail?.analytics_storage === 'granted');
});
} else {
// No CMP present and policy says performance RUM runs on legitimate interest.
resolveConsent(true);
}
function dispatch(payload) {
if (consentState === 'granted') sendBeaconPayload(payload);
else if (consentState === 'pending') pending.push(payload);
// denied: silently drop
}
function sendBeaconPayload(payload) {
const endpoint = '/rum/collect';
const blob = new Blob([JSON.stringify(payload)], { type: 'application/json' });
if (navigator.sendBeacon) {
navigator.sendBeacon(endpoint, blob);
} else {
fetch(endpoint, { method: 'POST', body: blob, keepalive: true }).catch(() => {});
}
}
This pairs with the dedicated walkthrough in Integrating Consent Mode with RUM Beacons, which covers the TCF string decoding and Consent Mode v2 signal set in full. The delivery path itself — sendBeacon, the ingestion endpoint, retries — is the subject of Self-Hosted Beacon Collection.
The PII-stripping edge handler
The browser does its best, but it cannot truncate its own IP and cannot see the headers the network attaches. The edge handler is the last hop where the raw client IP and request headers exist before the payload becomes a stored row, so it is the mandatory enforcement point. It truncates the IP, derives a coarse country, drops the referrer and user-agent, validates against the field schema, and forwards only what survives.
// Cloudflare Workers edge handler: truncate IP, drop identifying headers,
// schema-validate, then forward a privacy-safe row to internal ingestion.
const ALLOWED_FIELDS = new Set(['sid', 'nonce', 'ts', 'route', 'metrics', 'ect', 'device_tier']);
function truncateIp(ip) {
if (!ip) return null;
if (ip.includes(':')) {
// IPv6: keep the first three hextets (/48), zero the rest.
return ip.split(':').slice(0, 3).join(':') + '::';
}
// IPv4: zero the last octet (/24).
const o = ip.split('.');
return o.length === 4 ? `${o[0]}.${o[1]}.${o[2]}.0` : null;
}
function stripToAllowed(payload) {
const out = {};
for (const k of Object.keys(payload)) {
if (ALLOWED_FIELDS.has(k)) out[k] = payload[k];
}
return out;
}
export default {
async fetch(request) {
if (request.method !== 'POST') {
return new Response('Method Not Allowed', { status: 405 });
}
let payload;
try {
payload = await request.json();
} catch {
return new Response('Invalid JSON', { status: 400 });
}
// Drop everything not on the allow-list (defends against client regressions).
const safe = stripToAllowed(payload);
// Derive coarse geo BEFORE discarding the IP; never store the raw IP.
const rawIp = request.headers.get('CF-Connecting-IP');
safe.ip_net = truncateIp(rawIp); // /24 or /48, not the full address
safe.country = request.cf?.country ?? 'XX'; // 2-letter code only
if (!isValidRumRow(safe)) {
return new Response('Schema rejected', { status: 422 });
}
// Forward with NO original headers: referrer, user-agent, cookies all dropped.
await fetch('https://ingest.internal/rum/store', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(safe),
});
return new Response(null, { status: 204 });
},
};
function isValidRumRow(row) {
if (typeof row.ts !== 'number' || !Array.isArray(row.metrics)) return false;
return row.metrics.every(
(m) => typeof m.value === 'number' &&
['LCP', 'INP', 'CLS', 'FCP', 'TTFB'].includes(m.name),
);
}
The order is deliberate: stripToAllowed runs first so a client that accidentally starts sending a new field cannot get it into storage, and truncateIp runs before any forwarding so the raw IP exists only in the worker’s request scope and is never written anywhere.
Threshold configuration
Privacy constraints do not change the rating bands — they change how you query them, since you segment on coarse buckets rather than user-level fields. Aggregate at p75 over the privacy-safe device_tier, ect, and country columns and compare against Google’s current thresholds.
| Metric | Good | Needs Improvement | Poor | Engineering action when p75 breaches |
|---|---|---|---|---|
| LCP | ≤ 2.5 s | ≤ 4.0 s | > 4.0 s | Prioritize hero resource; segment by device_tier to find low-end regressions |
| INP | ≤ 200 ms | ≤ 500 ms | > 500 ms | Break long tasks; segment by route template to localize the interaction |
| CLS | ≤ 0.1 | ≤ 0.25 | > 0.25 | Reserve space for late content; check ad/font injection by country |
| FCP | ≤ 1.8 s | ≤ 3.0 s | > 3.0 s | Inspect render-blocking resources; segment by ect |
| TTFB | ≤ 800 ms | ≤ 1.8 s | > 1.8 s | Check edge caching and origin latency by country |
Because no row carries a user id, every breach is investigated through these coarse buckets. That is a feature: it forces analysis toward systemic causes (a slow region, a device class, a route) rather than chasing individuals.
Debugging workflow
When a privacy-safe pipeline misbehaves, the failure is usually either “no data arriving” (consent gate or beacon path) or “wrong data arriving” (a field that should have been stripped). Work the pipeline left to right.
- Identify the symptom. Is the metric volume zero, low, or carrying an unexpected field? Zero volume points at the consent gate; a leaked field points at the edge allow-list or client mapping.
- Trace the beacon in DevTools. Filter the Network tab to
/rum/collect, fire avisibilitychangeby switching tabs, and inspect the request payload. Confirm it contains only allow-listed fields and thatsiddiffers on every reload (proving it is ephemeral). - Correlate the consent timing. In the console, log
consentStateat the moment of first metric capture. If TTFB and FCP are buffered aspendingand never flush, the CMP callback is not resolving — verify__tcfapiorgtagis present before your script runs. - Validate at the edge. Hit the worker with a crafted payload containing a forbidden field (e.g. a raw
referrer) and confirm the stored row does not contain it. Confirmip_netis truncated andcountryis a 2-letter code. - Deploy the fix behind the same beacon endpoint so the change is observable in production without a separate rollout surface.
- Monitor the delta. Watch beacon volume and the p75 of each metric for 24 hours after the change; a sudden volume drop means you tightened the consent gate too far, a volume spike means you loosened it.
Field-data segmentation patterns
Segment on the low-entropy columns the schema permits, and watch for divergences that signal a real problem rather than a tracking opportunity.
- Device tier. Bucket viewport and hardware hints into
low/mid/highclasses. A p75 INP that is good onhighbut poor onlowis a main-thread budget problem on cheap CPUs, not a content problem. - Effective connection type. Split LCP and TTFB by
ect. A 3G cohort dragging the global p75 is a payload-weight issue; a 4G cohort regressing is more likely an origin or edge-cache issue. - Country / region. Derived from the truncated IP, this surfaces edge-cache misses and regional origin latency. Divergence here usually maps to TTFB.
- Route template. Because URLs are reduced to templates, you can still localize an INP regression to
/checkout/:idwithout knowing which order or which user.
The divergence to watch for is a single bucket moving while the global p75 stays flat — that is exactly the regression an unsegmented dashboard hides, and the reason segmentation is non-negotiable even under strict minimization.
Failure modes and gotchas
- Fingerprinting risk. The danger is cumulative entropy, not any single field. Raw viewport + timezone + full UA + a stable id together can re-identify even with no cookie. Defend by bucketing every high-entropy field (device tier, region) and never combining them with a persistent id — which the ephemeral session model already forbids.
- Referrer leakage. The
Refererheader anddocument.referrercan carry a full URL with query parameters from another site, sometimes including tokens. The edge handler drops the header, and the client never readsdocument.referrerinto a field; if you need same-origin context, send a boolean, not the URL. - Consent race on the first beacon. Early metrics (TTFB, FCP) can be ready before the CMP resolves. Buffering in
pendingand flushing onresolveConsentis the fix; the trap is firing a “default deny” beacon or, worse, an immediate send before the gate. Test by throttling the CMP callback by a few seconds and confirming nothing leaves until it resolves. - Safari lifecycle gaps. Safari has historically not fired a dependable final
visibilitychange, sopagehideis required as the finalizer of record. Verify on a real iOS device, not just the simulator. - Background-tab suspension. A backgrounded tab can be suspended before a timer-based flush runs. Never rely on
setIntervalalone to flush; the terminal lifecycle events are the source of truth, with the timer as a best-effort supplement. - Edge derives geo after truncation. Always read
request.cf.country(or your CDN’s pre-computed geo) rather than re-deriving country from the truncated IP — a /24 is too coarse for reliable geolocation, and re-deriving it client-side would require shipping the full IP, defeating the point.
CI and compliance gating
The strongest guarantee is a build gate that fails when a payload regresses into collecting something it should not. Treat the field allow-list as a contract and assert it in CI against the actual schema and a sample beacon, so a well-meaning feature PR cannot quietly add a high-entropy field.
// compliance.test.js — runs in CI; fails the build on any disallowed field.
import { describe, it, expect } from 'vitest';
import { basePayload } from '../src/rum/payload.js';
const ALLOWED = new Set(['sid', 'nonce', 'ts', 'route', 'metrics', 'ect', 'device_tier']);
const BANNED = ['ip', 'ipAddress', 'userAgent', 'ua', 'referrer', 'referer', 'email', 'userId', 'cookie'];
describe('RUM payload compliance gate', () => {
it('emits only allow-listed top-level fields', () => {
const payload = { ...basePayload(), metrics: [], ect: '4g', device_tier: 'mid' };
for (const key of Object.keys(payload)) {
expect(ALLOWED.has(key)).toBe(true);
}
});
it('contains no banned identifier-shaped keys at any depth', () => {
const payload = { ...basePayload(), metrics: [{ name: 'LCP', value: 2100, rating: 'good' }] };
const json = JSON.stringify(payload).toLowerCase();
for (const banned of BANNED) {
expect(json.includes(`"${banned}"`)).toBe(false);
}
});
it('produces an ephemeral session id (changes per invocation)', () => {
// Re-import in isolation would give a fresh SESSION_ID; here we assert UUID shape.
expect(basePayload().sid).toMatch(/^[0-9a-f-]{36}$/i);
});
});
Run this in the same job as your unit tests and mark it required for merge. Pair it with a periodic ClickHouse or BigQuery query in scheduled CI that scans recent rows for any column outside the allow-list and the truncated-IP shape; a non-empty result page-out is your signal that a deploy slipped a leak past review. The retention side is enforced declaratively: set a TTL on the raw table (for example 30 days) and a longer one on the aggregated rollups (for example 13 months), so deletion is a property of the schema rather than a cron job someone can forget.
FAQ
Do I need a consent banner to collect Core Web Vitals?
Often not. Performance RUM that carries no PII — no cookie, no cross-page id, a truncated IP, a reduced user-agent — can usually run on legitimate interest under GDPR Article 6(1)(f). The moment you attach a persistent identifier or store a full IP, that basis falls away and you need consent. Confirm the specifics with your DPO, since enforcement varies by jurisdiction.
How is a cookieless session id not just a cookie by another name?
A cookie persists across page loads and navigations, which is exactly what lets it link views to one person. The ephemeral session id is generated fresh on every page load, lives only in a memory variable, and is discarded on unload. It cannot correlate two page views, so it carries no cross-session identity and is not an identifier in the regulatory sense.
Where should IP truncation happen — client or edge?
The edge, because the browser cannot see or modify its own source IP. The edge handler is the last hop where the raw IP exists before the payload becomes a stored row, so it truncates to a /24 (IPv4) or /48 (IPv6) and derives a coarse country before discarding the full address. The raw IP is never written to durable storage.
What stops the team from accidentally adding a field that leaks PII?
Two layers. The edge handler strips any field not on the server-side allow-list, so a client regression cannot reach storage. A CI test asserts the payload contains only allow-listed keys and no identifier-shaped keys at any depth, failing the build before the change ever ships.
How do I avoid sending a beacon before consent resolves?
Buffer locally. Early metrics like TTFB and FCP can be ready before the CMP callback fires, so hold payloads in a pending queue and flush only once the consent decision resolves to granted. If it resolves to denied, drop the queue and send nothing; the lifecycle terminal events still fire but the gate keeps them from leaving.
Related
- GDPR-Compliant RUM Without Cookies — the full cookieless session model and IP/UA reduction recipe.
- Integrating Consent Mode with RUM Beacons — wiring TCF and Google Consent Mode signals into the beacon path.
- Self-Hosted Beacon Collection — the ingestion endpoint, sendBeacon delivery, and validation you forward to.
- RUM Data Sampling Strategies — sampling and p75 aggregation over privacy-safe columns.
- SpeedCurve vs Custom RUM — when a hosted vendor versus a self-controlled privacy pipeline is the right call.