Configuring OpenTelemetry for Frontend Performance

SDK Initialization & Context Propagation

Establishing a deterministic telemetry pipeline requires strict adherence to the Web Tracer SDK lifecycle. When implementing OpenTelemetry for Web RUM, initialize @opentelemetry/sdk-trace-web with WebTracerProvider and attach DocumentLoadInstrumentation, FetchInstrumentation, and UserInteractionInstrumentation. Ensure context propagation is bound to the global execution scope before any asynchronous network requests fire. Misconfigured context binding is the primary cause of orphaned spans during SPA hydration phases.

import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { DocumentLoadInstrumentation } from '@opentelemetry/instrumentation-document-load';
import { FetchInstrumentation } from '@opentelemetry/instrumentation-fetch';
import { UserInteractionInstrumentation } from '@opentelemetry/instrumentation-user-interaction';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { context, trace } from '@opentelemetry/api';

// 1. Initialize provider (sampler/exporter attached later)
const provider = new WebTracerProvider();

// 2. Register instrumentations
registerInstrumentations({
 instrumentations: [
 new DocumentLoadInstrumentation(),
 new FetchInstrumentation({
 propagateTraceHeaderCorsUrls: [/.*/], // Scope to your API domains in production
 clearTimingResources: true,
 }),
 new UserInteractionInstrumentation({
 eventNames: ['click', 'input', 'keydown'],
 shouldPreventSpanCreation: (eventType, element, event) => {
 // Filter out low-value interactions to reduce noise
 return element?.tagName === 'INPUT' && eventType === 'keydown';
 },
 }),
 ],
});

// 3. Bind context globally BEFORE app bootstrap
context.bind(provider);
provider.register();

Core Web Vitals Metric Mapping

Translating browser-native performance signals into OTLP-compatible metrics requires wrapping the web-vitals library callbacks. Bind onLCP, onINP, and onCLS to Meter.createObservableGauge() exporters, normalizing all timestamps against performance.timeOrigin. This alignment guarantees statistical consistency when aggregating field data across RUM Architecture, Tooling & Self-Hosting deployments. Avoid emitting metrics on every interaction; instead, use debounce thresholds to prevent metric cardinality explosion.

import { metrics } from '@opentelemetry/api';
import { onLCP, onINP, onCLS, onTTFB, onFCP } from 'web-vitals';

const meter = metrics.getMeter('frontend-cwv');
const cwvGauge = meter.createObservableGauge('web.cwv.value', {
 description: 'Core Web Vitals metric values',
 unit: 'ms',
});

const cwvBuffer: Record<string, number> = {};
let emitTimer: ReturnType<typeof setTimeout> | null = null;

// Debounced observer to prevent high-cardinality spikes
const scheduleEmission = () => {
 if (emitTimer) clearTimeout(emitTimer);
 emitTimer = setTimeout(() => {
 cwvGauge.addCallback((observableResult) => {
 Object.entries(cwvBuffer).forEach(([name, value]) => {
 observableResult.observe(value, { metric_name: name });
 });
 });
 }, 5000);
};

const trackMetric = (name: string, value: number) => {
 cwvBuffer[name] = value;
 scheduleEmission();
};

// Attach web-vitals callbacks
onLCP(({ value }) => trackMetric('LCP', value));
onINP(({ value }) => trackMetric('INP', value));
onCLS(({ value }) => trackMetric('CLS', value * 1000)); // Scale CLS for consistent percentile math
onTTFB(({ value }) => trackMetric('TTFB', value));
onFCP(({ value }) => trackMetric('FCP', value));

Beacon Export & Sampling Configuration

Unfiltered frontend telemetry generates unsustainable payload volumes. Implement TraceIdRatioBasedSampler at the provider level, targeting a 10-20% baseline for standard sessions while applying AlwaysOnSampler for error-boundary triggers. Configure navigator.sendBeacon as the primary transport layer to guarantee delivery during page unload events. For high-traffic applications, integrate dynamic sampling adjustments based on network effective type and device tier classification to preserve statistical validity without saturating ingestion queues.

import { AlwaysOnSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-web';
import { OTLPTraceExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-web';

// Dynamic sampler factory based on connection/device tier
const getDynamicSampler = () => {
 const conn = (navigator as any).connection;
 const effectiveType = conn?.effectiveType || '4g';
 const memory = (navigator as any).deviceMemory || 8;
 
 const isConstrained = effectiveType === '2g' || effectiveType === 'slow-2g' || memory < 4;
 return isConstrained ? new TraceIdRatioBasedSampler(0.05) : new TraceIdRatioBasedSampler(0.15);
};

// Re-initialize provider with dynamic sampler
const provider = new WebTracerProvider({ sampler: getDynamicSampler() });

// OTLP Exporter with keepalive (triggers sendBeacon fallback automatically)
const exporter = new OTLPTraceExporter({
 url: '/v1/traces',
 headers: { 'Content-Type': 'application/json' },
});

provider.addSpanProcessor(new SimpleSpanProcessor(exporter));

// Error-boundary override: Force AlwaysOnSampler when critical failures occur
window.addEventListener('error', (e) => {
 if (e.error?.isCritical) {
 provider.getTracer('error-override').startSpan('critical_failure', {
 attributes: { 'error.message': e.error.message }
 });
 // In practice, swap sampler or flag trace for backend-side AlwaysOn routing
 }
});

Debugging Span Loss & Attribution Gaps

Symptom diagnosis begins with validating the OTLP HTTP/JSON endpoint connectivity and CORS preflight responses. Use ConsoleSpanExporter in staging to trace trace_id persistence across route transitions. Common failure vectors include premature SDK teardown, missing traceparent headers on cross-origin fetches, and unhandled promise rejections that interrupt span finalization. Cross-reference dropped spans against geographic routing tables and privacy-compliant tracking flags to isolate whether data loss stems from network filtering or client-side execution constraints.

Rapid Triage Workflow:

  1. Verify Collector Connectivity: Confirm OTLP collector CORS headers (Access-Control-Allow-Origin, Access-Control-Allow-Methods) and payload size limits (max_request_body_size).
  2. Validate Context Propagation: Audit SPA router guards to ensure traceparent headers attach to all fetch/XMLHttpRequest calls.
  3. Check Teardown Timing: Ensure provider.shutdown() completes before beforeunload terminates the execution context.
  4. Audit Attribute Cardinality: Strip high-variance attributes (exact URLs, session tokens) before emission to prevent backend throttling.
  5. Throttle Testing: Validate metric emission under 3G throttling and 4x CPU slowdown in DevTools to catch race conditions.
import { ConsoleSpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-web';

// Staging-only diagnostic setup
if (process.env.NODE_ENV === 'development') {
 provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
}

// Graceful shutdown handler to flush beacon queue
window.addEventListener('beforeunload', async (event) => {
 // Prevent default unload to allow async flush
 event.preventDefault();
 await provider.shutdown();
 // Re-enable unload after flush completes
 setTimeout(() => window.location.reload(), 0);
});

Statistical Analysis & Dashboard Integration

Transform raw span attributes into actionable performance baselines by calculating p50, p75, and p95 latency percentiles. Map http.status_code, net.host.name, and device.memory attributes to correlate backend routing efficiency with frontend rendering bottlenecks. When visualizing telemetry, structure queries to filter by session duration and interaction depth, enabling precise comparisons between synthetic benchmarks and real-user field data. Proper attribute tagging ensures seamless integration with downstream analytics platforms without vendor lock-in.

Query & Aggregation Strategy:

  • Percentile Calculation: Use histogram_quantile(0.95, rate(span_duration_seconds_bucket[5m])) in PromQL or equivalent aggregation in your backend.
  • Attribute Correlation: Group by http.status_code >= 400 to isolate error-induced latency spikes. Cross-reference with device.memory < 4GB to identify hardware-constrained rendering delays.
  • Filtering Logic: Exclude bot traffic via user_agent regex and filter sessions with duration < 2s to remove accidental bounces.
  • Dashboard Layout: Structure panels by geographic region and connection type. Overlay synthetic Lighthouse scores against field p75 values to identify lab-to-field discrepancies.