Custom Metrics & Business Impact Tracking
Standard Core Web Vitals give every site a comparable performance baseline, but they stop at the document boundary: they cannot tell you when a checkout button became clickable, when a hero image specific to your conversion funnel actually painted, or which third-party script stole 180 ms of main-thread time during your most valuable interaction. This domain is about closing that gap — defining application-specific metrics with the browser’s measurement primitives, attributing main-thread cost to concrete code, and statistically tying those numbers to the revenue and engagement KPIs that justify a performance budget.
The work here builds directly on the platform fundamentals covered in Core Web Vitals & Performance Metrics Fundamentals and the ingestion pipeline covered in RUM Architecture, Tooling & Self-Hosting. Where those two domains standardize what every page captures and how beacons reach storage, this one is about what your product specifically needs to measure and why it matters to the business. A senior performance engineer who only reports Largest Contentful Paint and Interaction to Next Paint is reporting a generic proxy; the engineer who also reports “time to interactive cart, p75, segmented by device class, correlated to add-to-cart rate” is reporting a lever the business will fund.
Field Data vs Synthetic: Why Custom Metrics Live in RUM
Synthetic testing — Lighthouse runs, scripted WebPageTest sessions, CI performance gates — executes a fixed scenario on a fixed machine over a fixed network. It is reproducible and excellent for catching regressions before they ship, and the trade-offs are examined in depth in Synthetic vs Field Data Trade-offs. But synthetic data is, by construction, a single point in a distribution you do not control. Your real users are spread across a wide range of CPU speeds, RAM ceilings, radio conditions, and browser versions that no lab profile reproduces faithfully.
Custom metrics are almost meaningless synthetically. A synthetic “time to interactive cart” measured on a wired connection with an unthrottled CPU will report a number your users never experience. The signal only becomes actionable when collected as field data — real users, real devices, real sessions — and aggregated at p75. The p75 percentile is the canonical aggregation throughout this site for one reason: it is the threshold Google uses for Core Web Vitals assessment, and aligning your custom metrics to the same percentile lets you reason about standard and custom signals on one axis. Reporting a mean hides the slow tail where churn actually happens; reporting p75 keeps you honest about the experience three-quarters of your sessions stay under.
Because ranking and user experience are both driven by the distribution of real sessions, the entire value chain of custom metrics depends on capturing them in production. That is why every snippet below uses PerformanceObserver with buffered entries — covered in Web Vitals API Implementation — and ships them through the same beacon path as standard vitals, rather than computing anything in a lab.
Core Metric Reference
Custom metrics do not replace Core Web Vitals; they sit alongside them and inherit the same Good / Needs Improvement / Poor framing. The table below restates the standard thresholds your custom metrics must coexist with, then maps the custom-metric families this domain introduces to the engineering action each one drives.
| Metric | Good (p75) | Needs Improvement | Poor | Engineering action when breached |
|---|---|---|---|---|
| LCP | ≤ 2.5 s | ≤ 4.0 s | > 4.0 s | Preload the LCP resource, set fetchpriority, cut render-blocking CSS |
| INP | ≤ 200 ms | ≤ 500 ms | > 500 ms | Yield long tasks, defer non-critical handlers, shrink hydration |
| CLS | ≤ 0.1 | ≤ 0.25 | > 0.25 | Reserve space with aspect-ratio, size ad/font slots |
| FCP | ≤ 1.8 s | ≤ 3.0 s | > 3.0 s | Reduce TTFB, inline critical CSS, eliminate redirects |
| TTFB | ≤ 800 ms | ≤ 1.8 s | > 1.8 s | Edge-cache, warm origins, trim server work |
User Timing measure (e.g. cart_interactive) |
product-defined | product-defined | product-defined | Trace the mark pair, attribute to hydration or fetch |
| Element Timing (hero render) | product-defined | product-defined | product-defined | Prioritize the hero asset, decode off-thread |
| Long Task total / interaction | < 50 ms blocked | 50–200 ms | > 200 ms | Break up the task, move work to a worker or yield |
The standard rows use the current Google spec exactly. The custom rows are deliberately “product-defined”: there is no universal threshold for your cart-interactive time, so the discipline is to baseline it at p75 over a representative window, set a budget slightly tighter than today’s p75, and gate regressions against that budget.
User Timing API: Marks & Measures
The User Timing API: Marks & Measures cluster covers the foundational primitive for custom metrics: performance.mark() drops a high-resolution timestamp into the performance buffer, and performance.measure() computes the duration between two marks. Together they let you name and time any application-specific interval — route navigation to first cart paint, click to data-table render, modal open to form-ready — with sub-millisecond resolution from performance.now().
The pattern is to mark at semantic boundaries in your code, then measure once the closing boundary is reached:
// At route entry for the cart view
performance.mark('cart_route_start');
// Once the cart UI is hydrated and the primary button is clickable
function onCartInteractive() {
performance.mark('cart_interactive');
performance.measure('custom_cart_interactive', 'cart_route_start', 'cart_interactive');
}
The critical design choice is the custom_ name prefix. A single PerformanceObserver subscribed to measure entries can then filter for your metrics and ignore measures emitted by third-party libraries or the framework itself. Marks are also exposed to DevTools’ Performance panel timeline, so the same instrumentation that feeds RUM doubles as a lab-debugging aid. Single-page applications need extra care because the performance buffer persists across soft navigations and measure names can collide between route visits — the Instrumenting User Timing Marks in an SPA deep dive covers buffer hygiene, clearMarks/clearMeasures discipline, and per-navigation namespacing.
Element Timing API
User Timing measures intervals you instrument in script, but it cannot tell you when a specific DOM element actually painted to the screen — that requires the browser’s own paint accounting. The Element Timing API cluster covers exactly this: adding the elementtiming attribute to an image or text block makes the browser emit a PerformanceElementTiming entry the moment that element renders, with a renderTime you can attribute to a named element rather than the whole document.
<img src="/hero.avif" elementtiming="hero-image" fetchpriority="high" alt="Product hero">
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.identifier === 'hero-image') {
performance.measure('custom_hero_render', { start: 0, end: entry.renderTime });
}
}
}).observe({ type: 'element', buffered: true });
This matters because Largest Contentful Paint reports the single largest element on the page, which is frequently not the element that matters to your funnel. On a product page the LCP candidate might be a banner, while the conversion-relevant element is the product image below it. Element Timing lets you measure the render of the element you actually care about, independent of which element the browser happens to score as LCP. The Tracking Hero Render Time with Element Timing deep dive walks through renderTime vs loadTime, the cross-origin Timing-Allow-Origin requirement that otherwise zeroes renderTime, and how to fold the result back into a p75 dashboard.
Conversion Funnel Correlation
Capturing custom latency is only half the job; the payoff is joining it to behavior. The Conversion Funnel Correlation cluster covers the data model that makes that join possible: every timing beacon and every funnel event (view, add-to-cart, checkout, purchase) carries a shared session identifier, so the analytics layer can compute, for each funnel step, the p75 latency of the sessions that converted versus those that abandoned.
The analytical question is not “what is our p75 cart-interactive time” in isolation, but “how does that p75 differ between sessions that completed checkout and sessions that dropped.” When the abandoning cohort’s p75 is materially worse, you have a defensible performance-to-revenue hypothesis to test. This connects tightly to the cross-pillar User Impact Mapping work, which establishes the methodology for tying standard vitals to conversion; here you extend the same overlay technique to product-specific metrics. The Overlaying Core Web Vitals on Conversion Funnels deep dive shows the exact query shape for plotting vitals and custom metrics on the same funnel chart so a product analytics lead can read latency and drop-off in one view.
Long Task & Main-Thread Attribution
When a custom interval is slow, the next question is why — and the answer is usually that the main thread was blocked. The Long Task & Main-Thread Attribution cluster covers the Long Tasks API, which reports every contiguous block of main-thread work longer than 50 ms, with attribution entries that point at the container (your own script, an iframe, or an embedded third-party) responsible.
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
const culprit = entry.attribution?.[0]?.containerName || 'self';
performance.measure(`custom_longtask_${culprit}`, { start: entry.startTime, end: entry.startTime + entry.duration });
}
}).observe({ type: 'longtask', buffered: true });
Long-task data is the bridge between a slow custom metric and a concrete fix. If cart_interactive regresses and the long-task stream simultaneously shows a 240 ms task attributed to a tag-manager iframe, you have attribution rather than a guess. This overlaps with the interactivity work in Interaction to Next Paint, where breaking up long tasks with scheduler.yield is the standard remediation — the difference is that here you are attributing the blocking time to a named business interaction rather than to a generic interaction event. The Measuring Long Tasks with the Long Tasks API deep dive details the attribution shape, the upcoming long-animation-frame successor, and how to roll per-interaction blocking time into a p75 budget.
Production Instrumentation: The Unified Custom-Metric Collector
In production you do not run four separate observers wired to four separate beacons. You run one collector that subscribes to every relevant entry type, normalizes them into a single event shape, buffers, and flushes through a reliable transport. The collector below subscribes to measure, element, and longtask entries with buffered: true so entries that fired before the script booted are not lost, tags each event with cohort context, and finalizes on visibilitychange/pagehide so nothing is dropped on navigation.
// Unified custom-metric collector: User Timing + Element Timing + Long Tasks
const rumCollector = {
buffer: [],
maxBufferSize: 50,
endpoint: '/rum/ingest',
sessionId: crypto.randomUUID(),
init() {
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) this.ingest(entry);
});
// buffered:true replays entries observed before this code ran
observer.observe({ type: 'measure', buffered: true });
observer.observe({ type: 'element', buffered: true });
observer.observe({ type: 'longtask', buffered: true });
// Finalize once, on the first terminal signal the page reaches.
let finalized = false;
const finalize = () => {
if (finalized) return;
finalized = true;
this.flush(true);
};
addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') finalize();
});
addEventListener('pagehide', finalize);
},
ingest(entry) {
if (entry.entryType === 'measure' && !entry.name.startsWith('custom_')) return;
this.push({
type: entry.entryType,
name: entry.name || entry.identifier || 'longtask',
value: entry.duration,
start: entry.startTime,
sessionId: this.sessionId,
connection: navigator.connection?.effectiveType || 'unknown',
deviceMemory: navigator.deviceMemory ?? null
});
},
push(data) {
this.buffer.push(data);
if (this.buffer.length >= this.maxBufferSize) this.flush(false);
},
flush(isFinal) {
if (this.buffer.length === 0) return;
const events = this.buffer.splice(0, this.buffer.length);
const payload = JSON.stringify({ sessionId: this.sessionId, events });
this.transmit(payload, isFinal);
},
transmit(payload, isFinal) {
// sendBeacon survives unload; keepalive fetch is the fallback.
if (navigator.sendBeacon && navigator.sendBeacon(this.endpoint, payload)) return;
fetch(this.endpoint, {
method: 'POST',
body: payload,
keepalive: isFinal,
headers: { 'Content-Type': 'application/json' }
}).catch(() => {});
}
};
rumCollector.init();
This is the canonical hookup for everything above: instrument your code with custom_-prefixed measures, tag the elements you care about with elementtiming, and the collector picks all of it up plus the long-task stream automatically. The transport uses navigator.sendBeacon first — detailed in self-hosted beacon collection — and falls back to a keepalive fetch only when the beacon queue rejects the payload. The sessionId minted with crypto.randomUUID() is the join key the funnel-correlation layer relies on; it is ephemeral, cookie-free, and carries no PII.
Data Pipeline & Sampling Architecture
A high-traffic property emits far more timing events than it is economical to store at full fidelity, and storing raw events forever destroys query performance. The pipeline therefore does three things between the browser and the dashboard: it samples, it aggregates to p75, and it writes to columnar storage tuned for time-series scans.
Sampling
Not every session needs to be recorded. The RUM Data Sampling Strategies cluster covers the decision in depth, but the operative rule for custom metrics is consistent sampling: the sampling decision must be made once per session and applied to all of that session’s beacons, otherwise a session can contribute a cart-interactive measure but not the matching purchase event, silently biasing your correlation. Head-based sampling — decide at session start, propagate the decision — is the default. A simple deterministic gate keyed on the session id keeps the decision stable without server coordination:
// Deterministic 20% head-based sample, stable for the whole session.
function inSample(sessionId, rate = 0.2) {
let hash = 0;
for (let i = 0; i < sessionId.length; i++) {
hash = (hash * 31 + sessionId.charCodeAt(i)) >>> 0;
}
return (hash % 1000) / 1000 < rate;
}
During incidents you escalate to 100% sampling for diagnostic fidelity; during steady state 10–20% is plenty to compute stable percentiles, because percentile estimates converge quickly relative to means.
Aggregation at p75
The dashboard never queries raw events for trend views. A windowed rollup computes the p75 of each metric per bucket and cohort, which is both cheap to read and the percentile aligned with Core Web Vitals scoring. In ClickHouse the rollup is a single quantile over a time-bucketed group:
SELECT
toStartOfFiveMinutes(event_time) AS bucket,
metric_name,
connection,
quantile(0.75)(value) AS p75_ms,
count() AS samples
FROM rum_custom_metrics
WHERE event_time >= now() - INTERVAL 24 HOUR
AND metric_name = 'custom_cart_interactive'
GROUP BY bucket, metric_name, connection
ORDER BY bucket;
Segmenting the quantile by connection (and, in practice, by device-memory class and geography) is what turns a flat number into an actionable one: a global p75 that looks healthy often hides a slow-4g cohort sitting in the Poor band. Columnar engines such as ClickHouse, TimescaleDB, or BigQuery store each column contiguously, so a scan over value for one metric touches only the bytes it needs — the storage design that makes these rollups fast is covered in the RUM Architecture, Tooling & Self-Hosting domain.
Business & UX Impact: From Milliseconds to ROI
The reason to do any of this is to make latency a budget line the business defends. That requires moving from “this metric is slow” to “this metric costs us X in revenue,” which is a statistical exercise, not an assertion.
Correlation
The first step is establishing that a custom metric and a KPI move together. The Pearson correlation coefficient quantifies the strength and direction of the linear relationship between a metric distribution and a KPI distribution across cohorts or time buckets:
// Pearson correlation between custom-metric values and a KPI series.
function calculatePearsonCorrelation(metricValues, kpiValues) {
const n = metricValues.length;
if (n !== kpiValues.length || n === 0) return 0;
const sumX = metricValues.reduce((a, b) => a + b, 0);
const sumY = kpiValues.reduce((a, b) => a + b, 0);
const sumXY = metricValues.reduce((acc, x, i) => acc + x * kpiValues[i], 0);
const sumX2 = metricValues.reduce((acc, x) => acc + x * x, 0);
const sumY2 = kpiValues.reduce((acc, y) => acc + y * y, 0);
const numerator = (n * sumXY) - (sumX * sumY);
const denominator = Math.sqrt(
((n * sumX2) - (sumX * sumX)) * ((n * sumY2) - (sumY * sumY))
);
return denominator === 0 ? 0 : numerator / denominator;
}
// r > 0.5 indicates a meaningful positive relationship between latency and abandonment.
const r = calculatePearsonCorrelation(checkoutLatencies, cartAbandonmentRates);
A high r is a hypothesis, not proof: latency and abandonment can co-move because both rise during a marketing campaign that brings slower-device traffic. To isolate the performance effect from confounders such as seasonality, marketing spend, or a concurrent UI change, segment by cohort and validate with a controlled A/B test that toggles a single optimization behind a feature flag and measures the conversion delta with a 95% confidence interval.
Metric-to-KPI Mapping
Once correlations are established, the mapping table becomes the artifact you take to product leadership. The values below are illustrative of the relationships an e-commerce property typically observes:
| Custom Metric | Business KPI | Correlation Strength | Optimization Target |
|---|---|---|---|
custom_cart_interactive |
Add-to-Cart Rate | High (r ≈ 0.72) | JS hydration, critical CSS |
custom_checkout_api |
Payment Success | Very High (r ≈ 0.85) | API caching, retry logic |
custom_modal_shift |
Support Ticket Volume | Moderate (r ≈ 0.45) | Layout-shift prevention |
custom_hero_render |
Bounce Rate | High (r ≈ 0.68) | Image optimization, preload |
A Worked ROI Example
Suppose checkout-page sessions run at 1.0M per month, the current conversion rate is 3.0%, and average order value is $80, giving baseline revenue of 1,000,000 × 0.03 × $80 = $2,400,000 per month. Field correlation, validated by an A/B test, shows that a 200 ms reduction in custom_checkout_api p75 lifts conversion by 0.4 percentage points (from 3.0% to 3.4%). The new monthly revenue is 1,000,000 × 0.034 × $80 = $2,720,000 — an incremental $320,000 per month, or $3.84M annualized, attributable to a single 200 ms improvement. Framed that way, the engineering time to break up the responsible long task and add an edge cache is trivially justified, and the performance budget stops being a cost center.
FAQ
Why use p75 instead of the mean or median for custom metrics?
p75 is the percentile Google uses to assess Core Web Vitals, so aggregating custom metrics the same way lets you reason about standard and custom signals on one axis. It also keeps the slow tail visible: a mean is dragged around by outliers and a median ignores the upper quartile where churn concentrates, while p75 reflects the experience three-quarters of sessions stay under.
How are custom metrics different from Core Web Vitals?
Core Web Vitals are standardized, page-level proxies the browser computes for every site identically. Custom metrics are application-specific intervals you define and instrument — time to interactive cart, hero render, checkout API latency — that map to your funnel rather than to a generic document model. They complement, not replace, the standard vitals.
Do I need a separate observer for each timing API?
No. A single PerformanceObserver can subscribe to measure, element, and longtask entry types, and the unified collector shown above normalizes all three into one event shape and one beacon. Using buffered: true ensures entries that fired before the script booted are still captured.
How do I correlate a slow metric to a specific cause?
Run the Long Tasks API alongside your custom measures. When a custom interval regresses, the long-task stream’s attribution field names the container responsible — your script, an iframe, or a third-party tag — turning a vague slowdown into concrete attribution you can act on.
How do I tie a latency improvement to revenue?
Establish a correlation with the Pearson coefficient, confirm it is causal with a feature-flagged A/B test that isolates a single optimization, then translate the validated conversion-rate delta into incremental revenue using your traffic volume and average order value, as in the worked example above.
Related
- User Timing API: Marks & Measures — define custom intervals with
performance.mark()andperformance.measure(). - Element Timing API — measure when a specific named element actually paints, independent of LCP.
- Conversion Funnel Correlation — join timing beacons to funnel events on a shared session id.
- Long Task & Main-Thread Attribution — attribute main-thread blocking time to the responsible code.
- User Impact Mapping — the methodology for tying standard vitals to conversion and UX outcomes.
- RUM Data Sampling Strategies — consistent head-based sampling that keeps session joins intact.