Datadog vs New Relic vs Self-Hosted RUM

Picking a Real-User Monitoring platform is a multi-year commitment to a cost curve, a data-residency posture, and a ceiling on how much raw field data you can ever query. This page is a focused head-to-head — Datadog RUM versus New Relic Browser versus a self-hosted stack built from the web-vitals library and PerformanceObserver, Cloudflare Workers, ClickHouse, and Grafana — and it sits inside the broader RUM Vendor Comparison evaluation. Where the parent page surveys the field, this one pins down three concrete options, gives a worked monthly-cost estimate for each at ~10M pageviews, and ends with numbered evaluation steps that include the actual SDK init calls you would ship.

The short version: Datadog and New Relic both buy you near-instant time-to-value and zero operational burden, but bill on session/event volume and gate raw-data export behind their query layer. A self-hosted stack inverts that — higher up-front engineering, near-zero marginal cost per beacon, and unbounded access to raw events for p75 aggregation and ad-hoc analysis. The comparison below makes those trade-offs explicit so the decision is evidence-driven.

Datadog vs New Relic vs self-hosted RUM positioning A positioning diagram: Datadog RUM and New Relic Browser deliver fast time-to-value with per-session billing, while a self-hosted web-vitals plus Workers plus ClickHouse plus Grafana stack trades setup time for flat cost and raw-data ownership at high volume. Same beacons, three destinations Datadog RUM datadogRum.init() fast setup billed per session New Relic Browser agent fast setup billed per GB ingest Self-hosted web-vitals + Workers ClickHouse + Grafana flat at scale Decision axes time-to-value rises left to right cost; raw-data export easiest on the right residency control: vendor region pick vs full ownership all three emit the same web-vitals payload
All three options ingest the same web-vitals payload; they differ on cost model, raw-data access, and who runs the pipeline. See the parent RUM Vendor Comparison for the wider field.

Prerequisites

Before you run this evaluation, have the following in place so each option is measured on equal footing:

  • A representative traffic figure. This page assumes ~10M monthly pageviews, roughly 3.3M sessions/month at 3 pageviews per session. Substitute your own numbers — the cost math is linear in sessions/ingest.
  • Agreement on which signals you must collect. At minimum the three Core Web Vitals — Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift — plus FCP and TTFB as diagnostics.
  • A clear residency requirement (EU-only, US-only, or no constraint). This eliminates options fast.
  • A decision on whether you need raw per-event export for joins against business data, or whether vendor dashboards suffice.
  • For the self-hosted path: a Cloudflare account for the beacon ingestion endpoint, a ClickHouse instance (managed or self-run), and Grafana.

The comparison table

The three options measured against the dimensions that actually move a decision. Costs are list-price order-of-magnitude estimates at ~10M monthly pageviews; always re-price against a current quote.

Dimension Datadog RUM New Relic Browser Self-hosted (web-vitals + Workers + ClickHouse + Grafana)
CWV support (LCP/INP/CLS) Native, all three incl. INP attribution Native, all three incl. INP Native via web-vitals v4 (INP included), full attribution build
Sampling control Session sessionSampleRate, replay rate separate Configurable, per-app; less granular Total — head- or tail-based in your Worker
Raw-data export Via API/RUM query, rate-limited, retention-bound NRQL + API export, retention-bound Unlimited — it is your ClickHouse table
Cost at ~10M PV/mo ~$4,500–6,000/mo (session-billed) ~$2,500–4,500/mo (ingest-billed) ~$300–700/mo infra + engineering time
Data residency US/EU datacenter pick at signup US/EU region pick Anywhere you deploy (full control)
Time-to-value Hours Hours 1–3 weeks initial build
Retention 30 days RUM events (plan-dependent) 8–30 days events (plan-dependent) Unbounded (TTL is your policy)
Operational burden None None You own ingest, storage, dashboards

Worked monthly-cost estimates

The numbers below are illustrative at ~10M pageviews / ~3.3M sessions per month. Re-derive them from a live quote, but the shape of each curve is stable.

Assumptions
  Pageviews/month      = 10,000,000
  Pageviews/session    = 3
  Sessions/month       = 3,333,333
  Beacon payload       = ~1.2 KB/pageview (one batched vitals beacon)
  Raw ingest volume    = 10,000,000 * 1.2 KB = ~12 GB/month

Datadog RUM (session-billed). Pricing is per 1,000 sessions. At ~3.33M sessions, a representative list rate lands the RUM line around $4,500–6,000/month before Session Replay, which is billed separately and can double the bill if enabled at a high rate. Lowering sessionSampleRate to 0.25 cuts billed sessions proportionally — sampling is the primary cost lever here.

New Relic Browser (ingest-billed). New Relic bills on data ingested (per GB) plus user seats. RUM ingest is far larger than 12 GB once you account for full PageView/Interaction events and attributes — call it 150–400 GB/month at this traffic. At list per-GB rates that is roughly $2,500–4,500/month including a small seat count. The lever here is dropping attributes and sampling at the agent.

Self-hosted. Fixed infrastructure dominates:

Cloudflare Workers (ingest)   ~$5  base + ~$5/mo at 10M requests  ≈ $10
ClickHouse (managed, small)   ~$200-400/mo for 12 GB/mo + retention
Grafana (self-run or Cloud free tier)  $0-50/mo
Object storage / backups      ~$10/mo
                              -----------------------------------
Infra subtotal                ~$250-470/mo
Engineering (amortized build) 1-3 weeks once, then ~2-4 hrs/mo upkeep

At ~10M pageviews the self-hosted infra bill is roughly $300–700/month plus the one-time build and light upkeep. The crossover point where self-hosting wins on pure spend is typically a few million sessions per month; below that, the managed time-to-value usually wins.

Decision guidance

  • Choose Datadog RUM when you already run Datadog APM/logs and want vitals correlated with backend traces in one pane, you value Session Replay, and per-session billing at your volume is acceptable. Its strength is the unified trace-to-RUM story.
  • Choose New Relic Browser when you want a single ingest-priced platform spanning APM, infra, and browser, and you can keep ingest disciplined. NRQL is a genuinely strong ad-hoc query surface short of full raw export.
  • Choose self-hosted when you have a residency mandate that vendor regions don’t satisfy, you need raw per-event data joined to business tables, your session volume makes per-session billing punitive, or you require retention beyond 30 days. The cost is engineering ownership.

How to evaluate the three options

Run these numbered steps as a time-boxed bake-off. Each ships the same web-vitals signals so the dashboards are directly comparable.

  1. Stand up Datadog RUM. Add the SDK and init it with explicit sampling so you control billed sessions from day one.

    import { datadogRum } from '@datadog/browser-rum';
    
    datadogRum.init({
      applicationId: 'YOUR_APP_ID',
      clientToken: 'YOUR_CLIENT_TOKEN',
      site: 'datadoghq.eu',          // pick the residency region here
      service: 'storefront-web',
      env: 'production',
      sessionSampleRate: 25,         // 25% of sessions billed/collected
      sessionReplaySampleRate: 0,    // replay off; it bills separately
      trackResources: true,
      trackLongTasks: true,
      defaultPrivacyLevel: 'mask-user-input',
    });
    datadogRum.startSessionReplayRecording(); // omit if replay rate is 0

    Why: setting sessionSampleRate at init is the single biggest cost control; site fixes residency and cannot be changed without re-onboarding.

  2. Stand up New Relic Browser. Use the copy-paste browser agent, then trim ingest with sampling and attribute limits.

    // Inject the New Relic browser loader (head), then configure the agent:
    window.NREUM = window.NREUM || {};
    NREUM.init = {
      distributed_tracing: { enabled: true },
      privacy: { cookies_enabled: false },   // cookieless RUM
      ajax: { deny_list: ['bam.nr-data.net'] },
    };
    NREUM.loader_config = {
      accountID: 'YOUR_ACCOUNT_ID',
      trustKey: 'YOUR_TRUST_KEY',
      agentID: 'YOUR_AGENT_ID',
      licenseKey: 'YOUR_BROWSER_LICENSE_KEY',
      applicationID: 'YOUR_APP_ID',
    };

    Why: cookies_enabled: false keeps you cookieless, and pruning AJAX/attributes is how you keep the GB-ingest bill predictable.

  3. Build the self-hosted collector. Capture vitals with the web-vitals library and ship them via sendBeacon to your own ingestion endpoint.

    import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';
    
    const queue = new Set();
    const ENDPOINT = '/rum/collect';
    
    function record(metric) {
      queue.add({
        name: metric.name,
        value: metric.value,
        rating: metric.rating,       // good | needs-improvement | poor
        id: metric.id,
        navigationType: metric.navigationType,
        page: location.pathname,
        ts: Date.now(),
      });
    }
    
    onLCP(record);
    onINP(record);   // INP captured here, same as the managed agents
    onCLS(record);
    onFCP(record);
    onTTFB(record);
    
    function flush() {
      if (queue.size === 0) return;
      const body = JSON.stringify([...queue]);
      queue.clear();
      // sendBeacon survives page unload; keepalive fetch is the fallback
      if (!navigator.sendBeacon(ENDPOINT, body)) {
        fetch(ENDPOINT, { body, method: 'POST', keepalive: true });
      }
    }
    
    addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') flush();
    });
    addEventListener('pagehide', flush);

    Why: flushing on visibilitychange/pagehide is the only reliable point to capture the final INP and CLS values before the tab is discarded.

  4. Validate and write at the edge. A Cloudflare Worker applies head-based sampling and inserts into ClickHouse. Sampling lives here so you never pay to store noise — see RUM data sampling strategies for head- vs tail-based trade-offs and p75 implications.

    export default {
      async fetch(request, env) {
        if (request.method !== 'POST') return new Response('', { status: 405 });
    
        // Head-based sampling: keep 25% of beacons, matching Datadog's rate
        if (Math.random() > 0.25) return new Response('', { status: 204 });
    
        const events = await request.json();
        const rows = events
          .filter((e) => typeof e.value === 'number')
          .map((e) => ({
            name: e.name,
            value: e.value,
            rating: e.rating,
            page: e.page,
            ts: Math.floor(e.ts / 1000),
          }));
    
        await fetch(`${env.CLICKHOUSE_URL}/?query=INSERT INTO rum_events FORMAT JSONEachRow`, {
          method: 'POST',
          headers: { Authorization: `Basic ${env.CH_AUTH}` },
          body: rows.map((r) => JSON.stringify(r)).join('\n'),
        });
    
        return new Response('', { status: 204 });
      },
    };

    Why: keeping the sample rate identical across all three options is what makes the bake-off dashboards comparable — otherwise you’re comparing different denominators.

  5. Query p75 in ClickHouse and panel it in Grafana. Validate that your self-hosted p75 lines up with the vendor dashboards.

    SELECT
      name,
      quantile(0.75)(value) AS p75,
      count() AS samples
    FROM rum_events
    WHERE ts >= now() - INTERVAL 7 DAY
    GROUP BY name
    ORDER BY name;

    Why: p75 is the canonical headline aggregation for Core Web Vitals; computing it yourself confirms the self-hosted path produces the same ratings the managed tools report.

Verifying it works

  • Datadog: open RUM → Performance and confirm LCP/INP/CLS p75 tiles populate within minutes. Check the billed sessions counter in Plan & Usage matches your sessionSampleRate.
  • New Relic: run SELECT percentile(largestContentfulPaint, 75) FROM PageViewTiming SINCE 1 day ago in the query builder and confirm a non-null result; watch Data Ingest in Administration to verify GB/day tracks your estimate.
  • Self-hosted: confirm rows land with SELECT count() FROM rum_events WHERE ts >= now() - INTERVAL 1 HOUR, then verify the Grafana panel’s p75 for LCP, INP, and CLS sits within a few percent of the two vendor dashboards. Divergence beyond that usually means mismatched sample rates or a unit bug (CLS is unitless, LCP/INP are milliseconds).

Cross-check the ratings against the Google thresholds: LCP ≤ 2.5 s Good / ≤ 4.0 s Needs Improvement / > 4.0 s Poor; INP ≤ 200 ms Good / ≤ 500 ms NI / > 500 ms Poor; CLS ≤ 0.1 Good / ≤ 0.25 NI / > 0.25 Poor. FCP is Good at ≤ 1.8 s and TTFB at ≤ 800 ms.

Migration notes

  • Run a dual-write window. Keep the existing vendor agent live while the self-hosted beacon ships in parallel for at least two full weeks. Reconcile p75 per route before you cut over — a per-route delta under ~5% is your green light.
  • Match sample rates exactly during overlap. If Datadog samples 25% of sessions and your Worker samples 25% of beacons, the denominators still differ (session vs pageview). Normalize before comparing, or you’ll chase phantom regressions.
  • Migrate dashboards, not just data. Recreate each vendor alert as a Grafana alert on the ClickHouse p75 query before decommissioning, so you never have an alerting gap.
  • Watch attribution parity. Vendor INP attribution (target element, input delay) needs the web-vitals attribution build on the self-hosted side; capture metric.attribution or you’ll lose the debugging context the vendor gave you.
  • Plan retention up front. Set a ClickHouse TTL that matches or exceeds the vendor retention you’re replacing, so historical comparisons survive the cutover.

Edge cases & gotchas

  • Session vs pageview billing are not interchangeable. Datadog bills sessions; New Relic bills ingest; self-hosted “bills” per beacon. A naive cost comparison that uses one denominator for all three will be wrong by a large factor.
  • Session Replay is the silent budget killer on Datadog. Leave sessionReplaySampleRate at 0 during a cost bake-off, or the RUM line is unrepresentative.
  • New Relic ingest balloons with custom attributes. Each attribute you attach multiplies GB-ingest across every event; audit attributes before pricing.
  • Self-hosted INP can under-report without lifecycle flushing. If you flush only on beforeunload (which Safari treats unreliably) instead of visibilitychange/pagehide, you lose the final interaction on bfcache restores.
  • Residency is set at onboarding for both vendors. Moving a Datadog or New Relic account between US and EU regions is effectively a re-implementation, so decide residency before you sign.

FAQ

Which option supports INP best?

All three capture INP natively today — Datadog RUM and New Relic Browser via their agents, and the self-hosted stack via the web-vitals library’s onINP. For deep attribution (slow interaction target, input delay, presentation delay), both vendors and the web-vitals attribution build expose comparable detail, so INP support is not a differentiator on its own.

At what traffic does self-hosting become cheaper?

The crossover is usually a few million sessions per month. Below that, managed time-to-value typically wins; above it, the flat ~$300–700/month self-hosted infra bill undercuts session- or ingest-based pricing, which scales linearly with traffic.

Can I get raw, per-event RUM data out of Datadog or New Relic?

Partially. Both expose query APIs (Datadog’s RUM search API, New Relic’s NRQL/API export) but they are rate-limited and bounded by retention. Only the self-hosted ClickHouse table gives you unlimited raw events to join against business data.

Do all three satisfy EU data residency?

Datadog and New Relic both offer EU regions chosen at onboarding, which covers most mandates. A self-hosted stack lets you pin storage to any jurisdiction you control, which is the only option when residency requirements exceed what vendor regions provide.