User Timing API: Marks & Measures

The browser ships rich timings for navigation, paint, and resources, but it has no idea that your application considers a session “ready” only when the first search result paints, or that a cart is “interactive” only after the checkout button binds its handler. The User Timing API is how you teach the browser your application’s own milestones — performance.mark() drops a high-resolution timestamp, performance.measure() turns a pair of marks into a named duration, and a PerformanceObserver streams both into your collection pipeline. This page, part of Custom Metrics & Business Impact Tracking, covers how to define those metrics so they are durable, observable, and safe to aggregate at p75 alongside the standard vitals.

Custom timings are where RUM stops being a generic vitals dashboard and starts answering product questions. A standard LCP value tells you when the largest element painted; a time-to-first-result measure tells you when the useful content arrived, which is often a different — and slower — moment that no off-the-shelf metric captures. The discipline is the same as for any field metric: name it consistently, observe it with the buffered PerformanceObserver pattern from the web-vitals API implementation, sample it with the p75 aggregation and field-sampling strategy, and flush it over the self-hosted beacon collection path you already run for vitals.

User Timing marks to measure to RUM Two performance marks on a high-resolution timeline define a measure; a buffered PerformanceObserver captures the measure entry with its detail field and ships it to a RUM collector that aggregates the duration at p75. performance.now() timeline mark("search:start") mark("search:firstResult") measure("time-to-first-result") PerformanceObserver type: "measure" buffered: true Beacon collector name + duration + detail RUM p75 duration = end mark time − start mark time, in milliseconds, from one clock origin buffered: true replays marks/measures created before the observer attached
A start and end mark define a measure; a buffered observer captures it with its detail payload and ships the duration to RUM, where it is aggregated at p75 like any vital. See instrumenting User Timing marks in an SPA for the route-transition variant.

What the API actually gives you

There are two primitives and one observer. performance.mark(name, options) records a PerformanceMark entry — a single point in time, defaulting to the moment of the call but optionally pinned to an explicit startTime. performance.measure(name, startMarkOrOptions, endMark) records a PerformanceMeasure entry whose duration is the gap between two marks (or between a mark and “now”). Both entry types live in the same performance buffer the standard vitals use, and both surface through PerformanceObserver.

The critical, often-missed detail is that all of these timestamps share one clock origin: performance.timeOrigin. A mark’s startTime is milliseconds since that origin, with sub-millisecond resolution (subject to cross-origin-isolation throttling). Because every mark on the page is measured from the same origin, a measure’s duration is a clean, monotonic difference — there is no risk of the wall-clock jumping under you the way Date.now() deltas can. This is why you should never compute custom durations with Date.now(); always anchor them to marks or to performance.now().

The detail field is what makes User Timing a real metric system rather than a stopwatch. Both mark() and measure() accept an options object with a structured-cloneable detail payload, which rides along on the entry and is visible to your observer and to DevTools. Use it to carry the dimensions you will later segment by — route, result count, cache hit/miss, feature-flag bucket — so a single time-to-first-result measure arrives at the collector already annotated.

// A measure with structured context attached via detail.
performance.mark('search:start');
// … user query resolves, first result paints …
performance.mark('search:firstResult');

performance.measure('time-to-first-result', {
  start: 'search:start',
  end: 'search:firstResult',
  detail: { route: '/search', resultCount: 18, cache: 'miss', flag: 'ranker-v3' },
});

Threshold and parameter reference

Custom metrics have no Google-blessed Good/NI/Poor bands — you own the thresholds. But you should still publish them, because an unbudgeted custom metric is one nobody defends in review. Anchor your bands to user-perceptible intent: borrow the spirit of the standard vitals (the FCP Good band of ≤ 1.8 s and TTFB Good band of ≤ 800 ms are good reference points) and set the Poor edge where the experience visibly degrades. The table below is the config contract between the instrumentation and the warehouse.

Parameter Value / type Purpose
mark name feature:phase string Namespaced point-in-time; e.g. search:start
measure name kebab-case metric id Stable key the warehouse groups on; e.g. time-to-first-result
detail structured-cloneable object Segmentation dimensions (route, cohort, flag) carried on the entry
Observer type "measure" (and "mark") Which entry stream to receive
buffered true Replay entries created before the observer attached
Aggregation p75 over duration Headline stat, matching the standard vitals
Unit milliseconds (float) entry.duration; never seconds
Reported example budget time-to-first-result p75 ≤ 1200 ms App-specific Good edge
Reported example budget tti-for-cart p75 ≤ 2500 ms App-specific Good edge
Cardinality cap bounded name set Prevents per-id/per-query name explosion

A worked example for two app-specific metrics, with the bands you would defend in a perf review:

Custom metric Good (p75) Needs improvement (p75) Poor (p75) Engineering action
time-to-first-result ≤ 1200 ms ≤ 2500 ms > 2500 ms Prefetch ranker, stream first result, cache warm queries
time-to-interactive-for-cart ≤ 2500 ms ≤ 4000 ms > 4000 ms Defer non-critical hydration, split the checkout bundle
config-ready (flags fetched) ≤ 400 ms ≤ 900 ms > 900 ms Edge-cache config, inline the critical flag set

Measurement implementation

The production shape is a small collector that owns the observer, buffers entries, and flushes them on the page-lifecycle events. Two rules dominate the design. First, register the observer with buffered: true so that any mark or measure created before the observer attached — for example a nav:start mark dropped in a synchronous head script — is still replayed to your callback. Second, finalize on visibilitychange/pagehide with sendBeacon, because a page that is backgrounded or unloaded will never run a later setTimeout flush.

// user-timing-collector.js
// Captures performance measures (and selected marks), annotates them with
// session context, and flushes to the RUM endpoint on page lifecycle events.

const ENDPOINT = '/rum/usertiming';
const SESSION_ID = crypto.randomUUID();

// Only metric names we have explicitly budgeted are shipped. This bounds
// cardinality: a stray measure('debug-xyz') never reaches the warehouse.
const ALLOWED_MEASURES = new Set([
  'time-to-first-result',
  'time-to-interactive-for-cart',
  'config-ready',
]);

const buffer = [];

function record(entry) {
  if (entry.entryType === 'measure' && !ALLOWED_MEASURES.has(entry.name)) return;
  buffer.push({
    name: entry.name,
    type: entry.entryType,           // "measure" or "mark"
    start: Math.round(entry.startTime),
    duration: Math.round(entry.duration || 0),
    detail: entry.detail || null,    // structured context attached at mark/measure time
  });
}

// buffered:true replays entries created before this observer attached,
// so marks dropped in the document head are not lost.
const observer = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) record(entry);
});
observer.observe({ type: 'measure', buffered: true });
observer.observe({ type: 'mark', buffered: true });

function flush() {
  if (buffer.length === 0) return;
  const payload = JSON.stringify({
    session_id: SESSION_ID,
    page: location.pathname,
    sent_at: Date.now(),
    entries: buffer.splice(0, buffer.length), // drain; never double-send
  });
  // sendBeacon survives the unload that a fetch() would lose.
  navigator.sendBeacon(ENDPOINT, new Blob([payload], { type: 'application/json' }));
}

// Finalize once, on the first terminal lifecycle signal. pagehide covers
// bfcache; visibilitychange:hidden covers tab-switch on mobile Safari.
let finalized = false;
function finalize() {
  if (finalized) return;
  finalized = true;
  observer.takeRecords().forEach(record); // drain entries pending in the queue
  flush();
}
addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') finalize();
});
addEventListener('pagehide', finalize);

The helper that application code calls is deliberately thin: mark the start, mark the end, measure, and let the observer do the shipping. Keeping measure() calls in app code (not the collector) means the detail is authored where the context lives.

// usertiming.js — the surface application code imports.
export function markStart(feature) {
  performance.mark(`${feature}:start`);
}

export function markEndAndMeasure(feature, metricName, detail) {
  const end = `${feature}:end`;
  performance.mark(end);
  // Guard: measuring against a missing start mark throws SyntaxError.
  if (!performance.getEntriesByName(`${feature}:start`, 'mark').length) return;
  performance.measure(metricName, { start: `${feature}:start`, end, detail });
}

// In a search component:
//   markStart('search');
//   const results = await fetchResults(query);
//   markEndAndMeasure('search', 'time-to-first-result',
//     { route: '/search', resultCount: results.length, cache: results.fromCache ? 'hit' : 'miss' });

Clearing marks and bounding the buffer

The performance entry buffer is finite — the spec default is 250 entries for marks/measures, and once full, new entries are silently dropped. A long-lived single-page app that marks on every interaction will overflow it within minutes. Clear what you no longer need, but clear it after the observer has seen it, not before.

// Clear a feature's marks once its measure has been recorded.
// Order matters: the synchronous observer callback for the measure has
// already fired by the time the microtask queue drains, so clearing here
// is safe. clearMarks(name) removes only the named mark; clearMeasures
// likewise. Calling with no argument clears ALL — almost never what you want.
function clearFeature(feature, metricName) {
  performance.clearMarks(`${feature}:start`);
  performance.clearMarks(`${feature}:end`);
  performance.clearMeasures(metricName);
}

// Raise the buffer ceiling for measure entries on mark-heavy SPAs.
performance.setResourceTimingBufferSize?.(500); // resource buffer is separate

A subtle trap: clearMarks() with no argument wipes every mark on the page, including ones other libraries (analytics SDKs, frameworks, React’s own profiler) rely on. Always clear by name. The route-transition lifecycle that makes this matter most — clearing per-view so a long session does not leak entries — is covered in depth in instrumenting User Timing marks in an SPA.

Naming conventions that survive a warehouse

Names are your join keys, so treat them as a schema, not as free text. Two conventions pay for themselves immediately:

  • Marks use feature:phase. A colon-namespaced point — search:start, cart:hydrated, config:fetched — reads cleanly in DevTools and sorts predictably. Phases are a small closed vocabulary: start, end, plus domain phases like firstResult or interactive.
  • Measures use a stable kebab-case metric id. time-to-first-result, not Search TTFR (v2). The measure name is what your warehouse GROUP BYs on; if it drifts between releases, your p75 history fractures into orphaned series. Version through detail ({ ver: 'ranker-v3' }), never through the name.

The non-negotiable rule is bounded cardinality. Never interpolate unbounded values into a measure name — measure(search-${query}) produces a new metric per query and detonates your warehouse’s group-by. Keep the name set small and fixed; push everything variable into detail, which is a payload column, not a grouping key.

Field-data analysis patterns

Custom measures earn their keep through segmentation, and the dimensions you stashed in detail are exactly the cut points. Aggregate every custom metric at p75, then slice:

  • Device class. A time-to-interactive-for-cart that is fine on desktop but Poor on low-end mobile is the common shape — main-thread contention during hydration scales with CPU. If the mobile p75 is more than ~2× the desktop p75, the metric is gated on main-thread work, not network.
  • Network type. config-ready and time-to-first-result are network-bound; their p75 should track TTFB by effectiveType. A flat distribution across 4g and slow-2g is suspicious — usually a beacon-loss problem on slow networks, not genuinely uniform timing.
  • Geography. Edge-cacheable phases (config, static results) should converge across regions; origin-bound phases diverge. A region whose config-ready p75 spikes points at a cold edge POP or a missing cache rule there.
  • Cohort / flag. With the feature-flag bucket in detail, a custom metric becomes an experiment readout: compare ranker-v3 vs ranker-v2 time-to-first-result p75 directly, segmented by device, without a separate analytics pipeline.

The divergence to watch is a custom metric and its underlying vital disagreeing. If LCP is Good but time-to-first-result is Poor, the largest element is painting early while the useful content lags — a classic skeleton-screen artifact that vitals alone will never surface, and exactly the gap custom timing exists to close.

Debugging workflow

When a custom measure looks wrong — missing in the warehouse, implausibly fast, or wildly bimodal — work it in this order:

  1. Confirm the entry exists in the page. In DevTools, run performance.getEntriesByName('time-to-first-result', 'measure'). No entry means the measure() call never ran (the end mark was missing, or the code path was not exercised), not a pipeline problem.
  2. Trace it on the Performance panel. Custom marks and measures render in the Timings track of a recorded profile. A measure whose duration straddles a long task tells you the metric is main-thread-bound; correlate with Long Task & Main-Thread Attribution.
  3. Check observer timing. If the entry exists in getEntries() but never reached your callback, the observer attached after the entry was created and buffered was not set. Add buffered: true.
  4. Validate the marks bracket the right work. A suspiciously small p75 usually means the start and end marks are too close — the end mark fired on the promise resolving, not on the result painting. Move the end mark into a requestAnimationFrame after the DOM mutation.
  5. Inspect the detail payload at the collector. Confirm the segmentation fields arrived. A null detail in the warehouse means the options-object form of measure() was not used, or the value was not structured-cloneable.
  6. Diff lab against field. Reproduce in a throttled lab profile, then compare the lab duration to the field p75. A large gap is real population variance (slow devices, cold caches) that the lab cannot see — the point of measuring in the field at all.
  7. Monitor the p75 delta after the fix ships, segmented by the cohort you changed.

Failure modes and gotchas

  • Marks created before the observer attaches are lost without buffered: true. This is the headline failure. A mark dropped in the document head, or by a synchronous boot script, exists in the buffer but never reaches an observer that attaches later — unless you observe with buffered: true, which replays the buffer. Always set it for User Timing.
  • Measuring against a missing start mark throws. performance.measure(name, 'missing-start') throws a SyntaxError and aborts the surrounding code path. Guard with getEntriesByName(...).length before measuring, as the helper above does.
  • High-cardinality names detonate the warehouse. Interpolating query strings, ids, or timestamps into measure names creates a unbounded metric set that wrecks group-by performance and storage. Names are a closed vocabulary; variability goes in detail.
  • Buffer overflow silently drops entries. The 250-entry default fills on mark-heavy SPAs. Clear by name per view, and raise limits where appropriate. Silent drops look exactly like a sampling gap.
  • Clock-origin confusion. Custom durations computed with Date.now() instead of marks can go negative or jump when the system clock adjusts. Every timestamp must share performance.timeOrigin; use marks or performance.now() exclusively.
  • Cross-origin-isolation clamping. Without COOP/COEP headers, mark resolution is clamped (typically to 100 µs or coarser) to mitigate Spectre. Sub-millisecond measures will look quantized; it is the platform, not your code.
  • detail must be structured-cloneable. Functions, DOM nodes, and circular references throw on measure(). Keep detail to plain JSON-shaped values.
  • bfcache restores skip your start marks. A page restored from the back/forward cache does not re-run head scripts, so a nav:start mark may be absent. Re-anchor on the pageshow event’s persisted flag for restored navigations.

CI/CD gating

A custom metric you do not gate is a metric that silently regresses. Wire two guards into the pipeline:

  • Instrumentation presence test. In a headless browser run (Playwright/Puppeteer), drive the flow and assert the expected measures exist with sane durations: await page.evaluate(() => performance.getEntriesByName('time-to-first-result', 'measure').length) must be ≥ 1. This catches a refactor that deletes a mark — the failure mode that produces a metric silently absent from the field, which no field alert can fire on because there is no data.
  • Budget gate against field p75. In the nightly aggregation, fail the job if a custom metric’s reweighted p75 crosses its published Poor edge or regresses beyond a tolerance versus the prior window. Feed the same series into your performance dashboards so a regression is loud, not archaeology. Pair this with the p75 sampling and reweighting strategy so the gate compares like with like.

FAQ

When should I use a custom User Timing measure instead of a standard Web Vital?

Use a custom measure when the moment that matters to users has no standard metric — time-to-first-result, time-to-interactive-for-cart, or config-ready. Standard vitals like LCP and INP describe generic loading and responsiveness; they cannot know that your app is only “useful” once a specific element paints. Define the milestone with marks, measure the duration, and aggregate it at p75 next to the vitals.

Why are my marks missing from the PerformanceObserver callback?

Almost always because the observer attached after the marks were created and you did not pass buffered: true. The entries exist in the performance buffer but are only delivered to observers that opt into the buffer replay. Register with observer.observe({ type: 'measure', buffered: true }) and the same for mark, and pre-observer entries are replayed to your callback.

How do I attach custom context like route or cohort to a measure?

Use the options-object form: performance.measure(name, { start, end, detail }). The detail field accepts any structured-cloneable value and rides along on the entry, visible to your observer and to DevTools. Carry segmentation dimensions there — route, result count, cache state, feature-flag bucket — so the measure arrives at the collector already annotated, and keep them out of the metric name to bound cardinality.

Will high-cardinality mark names hurt my RUM pipeline?

Yes — interpolating variable values into measure names (time-to-first-result-${query}) creates a new metric per value and detonates the warehouse’s group-by and storage. Keep measure names a small, fixed vocabulary in kebab-case, and push everything variable into detail, which is a payload column rather than a grouping key. An explicit allow-list of shippable names at the collector enforces the bound.

Should I clear marks, and when?

Yes, on long-lived single-page apps, or the 250-entry buffer overflows and silently drops new entries. Clear by name (clearMarks('search:start')), never argument-less (which wipes every library’s marks), and only after the observer has recorded the corresponding measure. Per-route clearing on transition is the standard SPA pattern.