RUM Vendor Comparison
Choosing a real-user monitoring stack is a multi-year commitment that shapes how reliably you can attribute Core Web Vitals to code, how much of your traffic you actually observe, where user data lives, and what you pay once volume crosses tens of millions of page views a month. This page sits under RUM Architecture, Tooling & Self-Hosting and compares the major managed vendors — Datadog RUM, New Relic Browser, Sentry Performance, SpeedCurve, and Akamai mPulse — against each other and against a self-hosted beacon collection pipeline. The goal is not a winner; it is a defensible decision for your traffic profile, compliance posture, and engineering capacity.
The hard part of vendor selection is that the marketing surface (dashboards, alerting, session replay) is easy to demo, while the load-bearing properties (attribution fidelity, sampling control, data residency, and unit economics at scale) only become visible under production volume. A pilot that looks identical across three vendors on 1% of traffic can diverge by an order of magnitude in cost and a factor of two in INP attribution accuracy once you ship to 100%.
The decision matrix
The table below scores each option across the dimensions that actually differentiate RUM platforms in production. Ratings are relative (Strong / Adequate / Weak) and reflect default plans as of 2026; every vendor sells enterprise tiers that move some cells. Treat this as the skeleton you re-weight for your own context, not a leaderboard.
| Dimension | Datadog RUM | New Relic Browser | Sentry Performance | SpeedCurve | Akamai mPulse | Self-hosted |
|---|---|---|---|---|---|---|
| CWV/INP attribution fidelity | Strong (uses web-vitals attribution) | Adequate (event timing, coarser INP target) | Adequate (INP + LCP element, less ramp) | Strong (filmstrip + element attribution) | Strong (deep INP + bandwidth) | Strong (you control the schema) |
| Sampling control | Head-based %, configurable | Head-based %, plan-tiered | Dynamic + head-based traces | Beacon rate config | Head + adaptive | Head and tail, fully yours |
| Data ownership / residency | Vendor-held, US/EU regions | Vendor-held, US/EU regions | US/EU regions | US/EU | Akamai grid (global) | Yours, any region |
| Cost model at scale | Per session, steep at 100% | Per data + users | Per event/quota | Flat-ish per beacon | Enterprise contract | Infra + engineering time |
| Alerting | Strong (monitors, anomaly) | Strong | Adequate | Adequate (dashboard-led) | Strong | Build on Grafana/Prometheus |
| Retention | 30 days default, paid extend | 8 days default, tiered | 90 days events | 6–13 months trends | Contract-defined | Unlimited (storage cost) |
| Session replay | Strong | Strong | Strong | None | None | Build/integrate |
| Engineering burden | Low | Low | Low | Low | Low–medium | High (ops + pipeline) |
Two cells deserve emphasis. Attribution fidelity is the difference between a dashboard that says “INP p75 is 240 ms” and one that says “INP p75 is 240 ms, driven by the keydown handler on /checkout, 60% input delay.” Only vendors that ingest the web-vitals library attribution build (or equivalent target-element capture) give you the second sentence, and it is the sentence that fixes regressions. Cost at scale is where the managed/self-hosted line is drawn: per-session pricing is comfortable at 1% sampling and brutal at 100%, which is exactly the regime where rare slow INP interactions live.
Scoring the dimensions
CWV and INP attribution fidelity
The non-negotiable test is whether a vendor surfaces the target element and interaction phase for INP, the LCP element and its sub-part timings for LCP, and the shifting sources for CLS. All managed vendors report the headline numbers; they diverge on attribution depth. Datadog and SpeedCurve expose element-level attribution out of the box; New Relic and Sentry report the metric value with thinner element context on default plans. Self-hosting wins here only if you instrument the attribution build yourself — the platform does not give you fidelity, your beacon schema does.
A concrete capture that any vendor ingestion or your own beacon endpoint can consume:
import { onINP, onLCP, onCLS } from 'web-vitals/attribution';
function send(metric) {
const body = JSON.stringify({
name: metric.name,
value: metric.value,
rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'
id: metric.id,
navigationType: metric.navigationType,
attribution: metric.attribution, // target element, phases, sources
url: location.pathname,
});
// sendBeacon survives the page unload that closes the interaction
if (!navigator.sendBeacon('/rum', body)) {
fetch('/rum', { body, method: 'POST', keepalive: true });
}
}
onINP(send, { reportAllChanges: false });
onLCP(send);
onCLS(send);
When evaluating a vendor, send exactly this shape into their ingestion and confirm the attribution object survives intact in their UI. If it is dropped, you have bought a number, not a diagnosis.
Sampling control
Sampling is the single most consequential — and most opaque — lever. A vendor that samples head-based at a fixed plan-tier percentage you cannot raise will systematically under-represent the slow tail, because poor INP and poor LCP events are rarer than good ones. If you sample 5% head-based, your reported p75 is computed over a thinned sample and your p95/p99 may be noise. The detailed trade-offs live in RUM Data Sampling Strategies, but the vendor-selection rule is simple: demand to know the exact sampling algorithm and whether you can force-sample slow sessions.
// Tail-bias: always keep poor-rated events, sub-sample the good ones.
// Works whether you ship to a vendor endpoint or your own collector.
function shouldSend(metric, baseRate = 0.1) {
if (metric.rating === 'poor') return true; // keep every poor event
if (metric.rating === 'needs-improvement') return Math.random() < 0.5;
return Math.random() < baseRate; // thin the good majority
}
Vendors that only support uniform head-based sampling cannot express this, so your p75 is honest but your tail is starved. Self-hosting and a few vendors that accept a client-side decision let you keep every poor event while paying for a fraction of the good ones — a 10x cost reduction with no loss of regression signal.
Data ownership, residency, and cost at scale
Residency is a binary gate for regulated traffic: if EU user data may not leave the EU, a vendor without an EU ingestion region is disqualified before any feature comparison. Akamai mPulse ingests on its global edge grid; Datadog, New Relic, and Sentry offer US and EU regions; self-hosting puts the boundary wherever you provision the collector. Cross-reference your consent and PII handling against Privacy-Compliant Tracking before signing — a residency mistake is a contractual and legal problem, not a tuning problem.
Cost separates the field at volume. A worked example for 100M page views/month:
Managed (per-session, ~$1.50 per 1k sessions, 100% sampled):
100,000,000 sessions / 1000 * $1.50 = $150,000 / month
Managed with 10% head-based sampling:
10,000,000 / 1000 * $1.50 = $15,000 / month (but tail under-sampled)
Self-hosted (ClickHouse + edge collector, 100% ingest):
3 x ClickHouse nodes + edge workers + storage ≈ $2,500 / month infra
+ ~0.4 FTE ops (the real cost) ≈ $6,000 / month loaded
= ~$8,500 / month, full fidelity, data owned
The crossover is roughly where 100%-sampled managed cost exceeds the loaded cost of an engineer plus infrastructure. Below it, managed is cheaper and lower-risk; above it, self-hosting wins on both cost and the sampling/residency control that managed plans charge a premium for. The self-hosted column assumes you have the pipeline competence covered in Self-Hosted Beacon Collection.
Alerting, retention, and session replay
These three rarely flip a decision alone, but they compound. Alerting maturity (anomaly detection, multi-condition monitors, deploy-correlated alerts) is strongest in Datadog and New Relic; SpeedCurve and Akamai are more dashboard-led and expect you to wire alerts externally. Retention defaults are short — 8 to 30 days for raw events on most managed plans — which matters because Core Web Vitals are a 28-day rolling field measurement; if your tool discards raw events at day 8 you cannot reconstruct the window without paid extension. Session replay is a genuine debugging accelerator that no self-hosted stack gives you for free; if replay is a hard requirement, it weights heavily toward Datadog, New Relic, or Sentry. SpeedCurve and a custom stack are covered in depth in SpeedCurve vs Custom RUM if your priority is competitive benchmarking over replay.
Evaluation workflow: define, score, pilot
A disciplined selection runs in three stages and resists the temptation to demo-shop.
- Define needs and weight them. List every dimension from the matrix and assign a weight that reflects your context: a fintech with EU users weights residency at 30%; a consumer SPA with a small team weights engineering burden and session replay. Write the weights down before you talk to a vendor so the demo cannot move them.
- Score the shortlist against the weighted matrix. Fill the matrix cells with evidence, not vendor claims. For attribution fidelity, send the real beacon shape above and inspect the result. For sampling, get the algorithm in writing.
- Pilot at full traffic on one route. Deploy the top two candidates in shadow on a single high-traffic, high-interaction route (checkout, search) for two weeks at 100% sampling. This is the only stage that reveals true cost and whether INP attribution holds up under real interaction volume. A 1% pilot tells you nothing about the regime where the bill and the tail live.
// Shadow-pilot two vendors plus your own collector from one client.
// Send to all destinations; compare p75 and attribution coverage offline.
const SINKS = ['/rum', 'https://rum.vendor-a.example/ingest', 'https://rum.vendor-b.example/ingest'];
function fanout(metric) {
const body = JSON.stringify({ name: metric.name, value: metric.value,
rating: metric.rating, attribution: metric.attribution, url: location.pathname });
for (const sink of SINKS) {
if (!navigator.sendBeacon(sink, body)) {
fetch(sink, { method: 'POST', body, keepalive: true, mode: 'no-cors' });
}
}
}
['onINP', 'onLCP', 'onCLS'].forEach(async (fn) => {
const lib = await import('web-vitals/attribution');
lib[fn](fanout);
});
Which teams pick what
Segmentation beats a universal recommendation. The patterns below hold across most organisations.
| Team profile | Typical pick | Why |
|---|---|---|
| Small product team, observability already in Datadog/New Relic | Same vendor’s RUM | One pane of glass, deploy-correlated alerts, low burden |
| Performance/SRE team owning a public site | SpeedCurve + custom RUM | Competitive benchmarking plus deep attribution |
| Regulated (finance, health), EU residency | Self-hosted or EU-region vendor | Residency is a gate, not a preference |
| Very high traffic (>50M pv/mo), strong infra team | Self-hosted (ClickHouse/BigQuery) | 100% fidelity below managed cost crossover |
| Error-centric team already on Sentry | Sentry Performance | Reuses error context, shared session replay |
| CDN already on Akamai | mPulse | Edge-collected RUM with bandwidth attribution |
The cross-vendor head-to-head most teams actually run is detailed in Datadog vs New Relic vs Self-Hosted RUM, which carries the per-event pricing math and attribution screenshots one level deeper than this overview.
Failure modes to underwrite against
Three failure modes recur in post-mortems of RUM decisions, and each maps to a clause you should secure before signing.
Lock-in. Once dashboards, alerts, and SLOs are built against a vendor’s data model, migration is a quarter of work. Mitigate by keeping your client emitting a standard beacon shape (the web-vitals attribution object) and, where possible, dual-writing to a cheap raw store you own. If the vendor relationship ends, you keep the history.
Opaque sampling. A vendor that will not disclose its sampling algorithm can silently change it, moving your reported p75 with no code change on your side. This is indistinguishable from a real regression and burns on-call trust. Demand the algorithm in the contract and validate it during the pilot by comparing the vendor’s event count to your own dual-written count.
PII and residency drift. A field added to the beacon “for debugging” can quietly ship a user identifier into a non-compliant region. Pin an allowlist of beacon fields in code review and align it with Privacy-Compliant Tracking; residency is only as good as the least-disciplined beacon you ship.
Gating the choice in CI and SLOs
The decision does not end at signature — it ends when the chosen tool is wired into your regression gate. Whichever vendor or self-hosted stack you pick, export its p75 Core Web Vitals into your CI/SLO pipeline so a deploy that pushes INP p75 above 200 ms or LCP p75 above 2.5 s fails the release or burns error budget. Managed vendors expose this through query APIs; self-hosted stacks query the columnar store directly. Many teams render the same gate visually in Grafana Dashboards for Web Performance so the SLO and the alert share one source of truth. A RUM platform you cannot gate on is a reporting tool, not a control.
FAQ
Is self-hosted RUM cheaper than a managed vendor?
Only above a crossover that depends on volume and sampling. Below roughly 50M page views a month, the loaded cost of an engineer to run a pipeline exceeds a managed plan, and managed is also lower-risk. Above it — especially at 100% sampling where the slow tail lives — self-hosting on a columnar store typically costs less per event while giving you full data ownership and sampling control.
Which vendor gives the best INP attribution?
Datadog RUM and SpeedCurve expose target-element and interaction-phase attribution out of the box; New Relic and Sentry report the INP value with thinner element context on default plans. A self-hosted stack matches the best vendors only if you ingest the web-vitals attribution build yourself — fidelity comes from your beacon schema, not the platform.
Why pilot at 100% traffic instead of a small sample?
Cost and tail behaviour are invisible at 1%. Per-session pricing is comfortable on a sample and brutal at full volume, and rare poor-rated INP and LCP events — the ones worth fixing — only appear in quantity at high traffic. A 100% shadow pilot on one route is the only stage that reveals real cost and real attribution coverage.
How do I avoid vendor lock-in with RUM?
Keep your client emitting a standard beacon shape (the web-vitals attribution object) rather than a vendor SDK’s proprietary events, dual-write raw events to a cheap store you own, and build alerts on portable queries. Migration then becomes repointing an endpoint instead of re-instrumenting the app.
What’s the single most important dimension to score?
Sampling control, because it silently determines whether every other number is trustworthy. Opaque or fixed head-based sampling under-represents the slow tail and can shift your reported p75 with no code change. Get the algorithm in writing and verify it during the pilot.
Related
- Datadog vs New Relic vs Self-Hosted RUM — the three-way head-to-head with per-event pricing and attribution depth.
- SpeedCurve vs Custom RUM — benchmarking-led managed tooling versus a stack you own.
- Self-Hosted Beacon Collection — the ingestion pipeline a self-hosted choice depends on.
- RUM Data Sampling Strategies — head- vs tail-based sampling and honest p75 aggregation.
- Grafana Dashboards for Web Performance — visualising and SLO-gating the metrics whichever vendor you pick.