User Impact Mapping

Q: How do I join RUM beacons to analytics events reliably?

Generate one stable session_id per session, persist it in sessionStorage, and attach it to both the RUM beacon and every analytics event. Then LEFT JOIN the two streams on that key in your warehouse. The most common failure is an id that regenerates on SPA route changes, which silently collapses the join rate, so monitor the join rate as a first-class metric.

Q: What causes a good lab, bad business mismatch?

Almost always a cohort the lab never represented, typically slow-network mobile, combined with reading p50 instead of p75, or an interaction-dependent metric like INP that scripted lab runs miss. Re-segment the field data by device, network, and geo and switch the headline to p75 before concluding the metrics are wrong.

User impact mapping is the discipline of turning Real-User Monitoring field data into business numbers: bounce rate, conversion rate, session depth, and revenue per session, segmented by the cohorts that actually move them. It is the analytical layer that sits on top of the raw metrics established in Core Web Vitals & Performance Metrics Fundamentals, and it is where most performance programs either earn their budget or quietly get defunded. A dashboard that shows LCP at the 75th percentile is a health signal; a table that shows “moving LCP p75 from 3.4 s to 2.4 s on mobile recovered an estimated $41k/month in checkout revenue” is a mandate. This page covers how to join beacons to analytics events, the statistical methods that survive scrutiny, a worked impact table, an ROI formula, and the triage workflow for the most common failure — lab looks great, the business does not.

The mapping pipeline: join beacons to events by session id, bucket at p75 per cohort, then validate the correlation with an experiment before quoting an ROI figure tied to funnel stages.

Why field data, not lab data, drives this work

Impact mapping only works on field data because the business outcome is generated by real users on real devices. A synthetic run on a fast CI machine tells you whether a regression exists; it cannot tell you that the regression cost you conversions among 3G users in São Paulo. The vitals you map come from the same beacon stream the rest of your program emits — captured via the web-vitals library and PerformanceObserver — but the analysis is fundamentally about distributions, not point estimates. Use p75 as the headline aggregation everywhere, because that is the percentile Google rates and the one your users at the painful end of the distribution actually live in. A p50 that looks “Good” while p75 is “Poor” is the exact distribution that bleeds revenue silently.

The thresholds you bucket against are fixed by the current Google spec. Map every session into a Good / Needs Improvement / Poor band per metric before you join anything, because the relationship between performance and conversion is almost never linear — it has cliffs at the band boundaries.

Metric	Good (p75)	Needs Improvement (p75)	Poor (p75)	Mapping action
LCP	≤ 2.5 s	≤ 4.0 s	> 4.0 s	Bucket sessions by LCP band; correlate band → bounce and band → conversion
INP	≤ 200 ms	≤ 500 ms	> 500 ms	Join INP band to interaction-depth and checkout-abandon events
CLS	≤ 0.1	≤ 0.25	> 0.25	Correlate CLS band to misclick / rage-click and form-abandon rates
FCP	≤ 1.8 s	—	—	Use as an early proxy for perceived-speed bounce on entry pages
TTFB	≤ 800 ms	—	—	Control variable; isolates backend latency from front-end render cost

Joining beacons to analytics events by session id

The entire method rests on one shared key: a stable session_id written into both the RUM beacon and every analytics event. Generate it once per session, persist it in sessionStorage, and attach it to both streams. Without it you are reduced to comparing aggregate trends, which cannot survive a confounder challenge.

import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals/attribution';

function getSessionId() {
  let id = sessionStorage.getItem('rum_sid');
  if (!id) {
    id = (crypto.randomUUID ? crypto.randomUUID() : String(Date.now()) + Math.random());
    sessionStorage.setItem('rum_sid', id);
  }
  return id;
}

const SESSION_ID = getSessionId();

function cohort() {
  const c = navigator.connection || {};
  return {
    device: matchMedia('(pointer: coarse)').matches ? 'mobile' : 'desktop',
    network: c.effectiveType || 'unknown',
    saveData: !!c.saveData,
    geo: undefined // filled server-side from the request IP, never client-side
  };
}

const buffer = [];
function record(metric) {
  buffer.push({
    session_id: SESSION_ID,
    name: metric.name,
    value: Math.round(metric.value),
    rating: metric.rating,
    route: location.pathname,
    nav_id: metric.navigationId,
    ...cohort()
  });
}

onLCP(record);
onINP(record);
onCLS(record);
onFCP(record);
onTTFB(record);

// Finalize once, when the page is actually being unloaded or backgrounded.
function flush() {
  if (!buffer.length) return;
  const body = JSON.stringify({ batch: buffer.splice(0) });
  // sendBeacon survives the unload that fetch() does not.
  navigator.sendBeacon('/rum/collect', body);
}
addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') flush();
});
addEventListener('pagehide', flush);

The same SESSION_ID must be threaded into your analytics calls so the warehouse join has a key on both sides. Note that the beacon is flushed on visibilitychange/pagehide, not unload — INP in particular is only final at page hide, and sendBeacon is the only transport that reliably survives a backgrounding tab. The volume of these beacons is controlled by sampling and p75 aggregation strategy; for impact mapping specifically, sample at the session level (keep or drop whole sessions) rather than the beacon level, or you will join half a session’s vitals to its conversion and corrupt the cohort.

Statistical methods that survive scrutiny

A correlation between LCP and conversion is trivial to produce and trivial to discredit. The four methods below are what make an impact claim defensible.

p75 bucketing, not averages

Average LCP is dominated by the tail and hides the cliff. Bucket each session into its Good/NI/Poor band, compute conversion rate per band, and report the gap. This is the unit you will defend, and it is robust to the extreme outliers that wreck a mean.

-- Conversion rate by LCP band, with p75 of LCP inside each band for context.
WITH sessions AS (
  SELECT
    r.session_id,
    APPROX_QUANTILES(r.lcp_ms, 100)[OFFSET(75)] AS lcp_p75_ms,
    CASE
      WHEN APPROX_QUANTILES(r.lcp_ms, 100)[OFFSET(75)] <= 2500 THEN 'good'
      WHEN APPROX_QUANTILES(r.lcp_ms, 100)[OFFSET(75)] <= 4000 THEN 'ni'
      ELSE 'poor'
    END AS lcp_band,
    ANY_VALUE(r.device)  AS device,
    ANY_VALUE(r.network) AS network
  FROM `project.rum.beacons` r
  WHERE r.name = 'LCP'
    AND r.timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 28 DAY)
  GROUP BY r.session_id
)
SELECT
  s.lcp_band,
  s.device,
  COUNT(*)                                            AS sessions,
  COUNTIF(a.session_id IS NOT NULL)                   AS converters,
  SAFE_DIVIDE(COUNTIF(a.session_id IS NOT NULL), COUNT(*)) AS conversion_rate
FROM sessions s
LEFT JOIN (
  SELECT DISTINCT session_id
  FROM `project.analytics.events`
  WHERE event_name = 'purchase'
) a USING (session_id)
GROUP BY s.lcp_band, s.device
ORDER BY s.device, s.lcp_band;

Correlation and the confounder problem

A raw Pearson or Spearman correlation between session LCP and a binary “converted” flag overstates the effect because slow sessions are not random — they skew toward cheaper devices, worse networks, and emerging-market geos that already convert less for reasons unrelated to speed. Always report correlation within a cohort, never pooled across cohorts, or you will measure the device mix and call it performance.

Regression to control confounders

To make a causal-flavored claim, fit a logistic regression of conversion on the metric band plus the cohort variables as controls. The coefficient on the band, holding device, network, geo, and traffic source constant, is the closest you get to an effect estimate from observational data.

-- One row per session, ready for a logistic model (e.g. BigQuery ML).
CREATE OR REPLACE MODEL `project.rum.conv_model`
OPTIONS(model_type = 'logistic_reg', input_label_cols = ['converted']) AS
SELECT
  IF(EXISTS(SELECT 1 FROM `project.analytics.events` e
            WHERE e.session_id = s.session_id AND e.event_name = 'purchase'), 1, 0) AS converted,
  s.lcp_band,
  s.inp_band,
  s.device,
  s.network,
  s.geo,
  s.traffic_source
FROM `project.rum.session_features` s;

The model output tells you the marginal effect of moving a session out of the Poor LCP band after the device/network/geo skew is removed. That is the number you put in front of finance.

A/B testing with confidence intervals

Regression controls confounders you measured; an experiment controls the ones you did not. The gold standard is to ship the performance fix to a randomized fraction of traffic and measure the conversion delta with a confidence interval. If the 95% interval on the lift excludes zero, you have causal evidence. This is also the only way to break ties when correlation and regression disagree.

-- A/B lift on conversion with a Wald 95% interval on the difference of proportions.
WITH arms AS (
  SELECT
    variant,
    COUNT(*)                         AS n,
    COUNTIF(converted)               AS c,
    SAFE_DIVIDE(COUNTIF(converted), COUNT(*)) AS p
  FROM `project.rum.experiment_sessions`
  GROUP BY variant
)
SELECT
  t.p - ctl.p AS lift,
  1.96 * SQRT(ctl.p*(1-ctl.p)/ctl.n + t.p*(1-t.p)/t.n) AS half_width_95
FROM arms t, arms ctl
WHERE t.variant = 'treatment' AND ctl.variant = 'control';

A worked impact table: metric deltas to KPI deltas

Below is the artifact that turns analysis into a roadmap. Each row is a measured cohort, the metric improvement you achieved or are targeting, and the KPI delta from the regression plus A/B validation. These figures are illustrative of the shape — your numbers come from your own joins — but the structure is exactly what an impact review should produce.

Cohort	Metric move (p75)	Bounce Δ	Conversion Δ	Session-depth Δ	Notes
Mobile / 4G	LCP 3.4 s → 2.4 s	−6.1 pts	+0.42 pts	+0.9 pages	A/B validated, 95% CI excludes 0
Mobile / 3G	LCP 5.1 s → 3.6 s	−9.4 pts	+0.71 pts	+1.3 pages	Largest revenue lever; survivorship-bias checked
Desktop	INP 480 ms → 190 ms	−1.2 pts	+0.18 pts	+0.4 pages	Effect concentrated on search/filter routes
Mobile / all	CLS 0.28 → 0.08	−3.0 pts	+0.25 pts	+0.2 pages	Ad-slot reservation; misclick rate −22%
Tablet	LCP 2.9 s → 2.4 s	−1.0 pts	+0.05 pts	~0	Below the cliff already; deprioritize

The tablet row is the discipline working correctly: it was already inside the Good band, the cliff was behind it, and the model showed no meaningful lift — so it does not get engineering time. Impact mapping is as much about saying no as saying yes.

The ROI formula

To translate a conversion delta into money, you need only four inputs per cohort: monthly sessions in the cohort, the conversion-rate delta as a proportion, average order value, and gross margin if you want profit rather than revenue.

monthly_revenue_lift = sessions_per_month
                     × conversion_rate_delta   (as a proportion, e.g. 0.0042)
                     × average_order_value

monthly_profit_lift  = monthly_revenue_lift × gross_margin

ROI                  = (annualised_profit_lift − engineering_cost) / engineering_cost

Worked example for the Mobile/4G row: 1,200,000 monthly sessions × 0.0042 × $82 AOV = $41,328/month in incremental revenue. At a 60% margin that is ~$24.8k/month profit, or ~$298k/year; against an estimated 3 engineer-weeks (~$24k) of work the first-year ROI is roughly 11×. Always quote the cohort, the metric move, and the confidence basis (A/B vs regression-only) alongside the dollar figure, because a number without those three caveats will be challenged and will lose.

Triage workflow: “good lab, bad business”

The most common and most damaging scenario is a green synthetic dashboard sitting next to flat or falling business metrics. Work it as a numbered sequence; do not jump to “the metrics are wrong.”

Confirm the join is honest. Count beacons that have a matching analytics event. If a large fraction of beacons never join, your conversion rates per band are computed on a non-representative subset — the classic survivorship trap (sessions that bounced before the beacon flushed are simply missing). Inspect the unjoined population before trusting any rate.
Re-segment from field data, not lab. Lab runs on one device/network; the business is generated across the whole cohort distribution captured in the field. Re-cut every KPI by device, network, and geo. “Good lab, bad business” almost always resolves to one cohort — typically slow-network mobile — that the lab never represented.
Check the percentile you are reading. A green p50 hiding a Poor p75 is the textbook case. Switch the headline to p75 and the divergence usually appears.
Look for a metric the lab does not capture. INP and CLS are interaction- and lifecycle-dependent and frequently look fine in a scripted lab run while being Poor in production on real interactions. Field-only Poor bands point straight at these.
Test for sampling skew. If sampling is beacon-level rather than session-level, or if a sampling change shipped recently, your cohort proportions can drift independent of any real change. Validate the sample composition against known traffic mix.
Run the experiment. When correlation says one thing and the business says another, stop arguing from observational data and ship an A/B test. The confidence interval ends the debate.

Failure modes and gotchas

Survivorship bias. Sessions that bounced fastest often bounced because the page was slow and left before the beacon flushed. Your slow band is therefore under-counted for its worst outcomes, which understates the impact of performance. Mitigate by emitting an early, minimal beacon on first interaction or on a short timer, separate from the final vitals flush.
Sampling skew. Sample whole sessions, not individual beacons, and keep the sample rate stable across the comparison window. A rate change mid-analysis is indistinguishable from a real effect.
Cohort confounding. Never pool correlations across device or geo. Slow correlates with cheap-device-emerging-market, which correlates with lower baseline conversion for non-performance reasons.
Unstable session ids. A session_id regenerated on every navigation (common in SPAs that clear storage on route change) silently halves your join rate. Verify the id persists across the full session.
Reverse causation. Engaged users who scroll and interact more also accumulate more layout shift and more interactions — so heavy CLS/INP can be a symptom of engagement, not a cause of abandonment. Regression with engagement controls, then an experiment, disentangles this.
Geo from the client. Do not trust client-reported geo or timezone for cohorting; resolve it server-side from the request to avoid VPN and clock skew.

CI and reporting cadence

Impact mapping decays if it is a one-off study, so wire it into the same regression gate as the raw metrics. Gate the metric in CI; report the impact on a fixed cadence.

# Performance budgets enforced in CI, with the impact review cadence pinned.
budgets_p75:
  lcp_ms: 2500
  inp_ms: 200
  cls:    0.1
  ttfb_ms: 800

gates:
  fail_build_on_breach: true
  tolerance_pct: 5

impact_review:
  cadence: weekly          # re-run the cohort joins and refresh the impact table
  segment_by: [device, network, geo]
  require_session_join_rate_above: 0.85   # alert if the join rate degrades
  ab_required_for_roi_claim: true         # no dollar figure without an experiment
  divergence_alert:
    field_vs_lab: "> 15% -> open triage"

The require_session_join_rate_above guard is the most important line: it catches the silent failure where a tracking change breaks the session_id and quietly invalidates every impact number you publish.

FAQ

Why use p75 instead of the average when mapping to revenue?

The relationship between performance and conversion is non-linear and cliff-shaped, and the average is dominated by the tail. p75 is the percentile Google rates and the one your at-risk users experience, so bucketing sessions into Good/NI/Poor bands at p75 exposes the cliff where conversion actually drops, which an average hides.

How do I join RUM beacons to analytics events reliably?

Generate one stable session_id per session, persist it in sessionStorage, and attach it to both the RUM beacon and every analytics event. Then LEFT JOIN the two streams on that key in your warehouse. The single most common failure is an id that regenerates on SPA route changes, which silently collapses the join rate — monitor the join rate as a first-class metric.

Correlation or A/B testing — which should I trust for an ROI claim?

Both, in order. Use within-cohort correlation to find candidates, logistic regression to control measured confounders like device and network, and an A/B test to control the ones you did not measure. Only quote a dollar figure when a randomized experiment’s 95% confidence interval on the lift excludes zero.

What causes a “good lab, bad business” mismatch?

Almost always a cohort the lab never represented — typically slow-network mobile — combined with reading p50 instead of p75, or an interaction-dependent metric like INP that scripted lab runs miss. Re-segment the field data by device, network, and geo and switch the headline to p75 before concluding the metrics are wrong.

How do I stop survivorship bias from hiding performance impact?

Fast bounces often leave before the final vitals beacon flushes, so your slow band under-counts its worst outcomes and understates impact. Emit an early, minimal beacon on first interaction or a short timer — separate from the lifecycle flush on pagehide — so abandoned sessions still contribute a row to the join.

Mapping Core Web Vitals to Conversion Rates — the step-by-step join and lift-curve walkthrough for this method.
Conversion Funnel Correlation — overlaying vitals on funnel stages to find the abandoning step.
Custom Metrics & Business Impact Tracking — the broader practice of instrumenting business-relevant timings.
INP Tracking & Debugging — isolating the interaction latency that drives checkout abandonment.
LCP Measurement & Optimization — reducing the load metric most tightly coupled to bounce on entry pages.