Building a Core Web Vitals Grafana Dashboard
You have RUM beacons landing in ClickHouse and now need a dashboard that answers “is our p75 LCP healthy for mobile users on the checkout route in Germany, right now?” without anyone running ad-hoc SQL. This page is a step-by-step build of exactly that dashboard, expanding on the read side introduced in Grafana dashboards for web performance. You will end with one time-series panel per metric computing the correct percentile, Good/Needs-Improvement/Poor threshold bands, template variables to slice by device, country, and route, a stat panel comparing current p75 against the Google threshold, and a Grafana-managed alert that fires when p75 LCP crosses 2.5 s.
The non-obvious work is not wiring a datasource — it is making every panel aggregate at the same percentile your SLOs are defined on, making the time grouping match Grafana’s $__timeFilter, and making the alert query stable enough that it does not flap. We cover each with real ClickHouse SQL and a JSON model excerpt you can import.
Prerequisites
Before building any panel, confirm the following are in place:
- A populated ClickHouse table of RUM events — one row per metric sample with at least
event_time DateTime,metric LowCardinality(String)(valuesLCP,INP,CLS),value Float64,device LowCardinality(String),country LowCardinality(String), androute LowCardinality(String). If you have not built it, start with ClickHouse storage for RUM beacons. - The official ClickHouse datasource plugin (
grafana-clickhouse-datasource) installed and pointed at that table. Note its datasource UID — you will reference it in the JSON model. - Grafana 10.4+ with Grafana-managed alerting enabled (the default), so alert rules can target a ClickHouse query directly.
- Agreement that all SLOs are stated as p75, consistent with how Google’s CrUX evaluates the page and with your RUM sampling and p75 aggregation strategy. p75 is the headline statistic for every panel below; never chart the mean.
The threshold reference
Every band, color step, and alert below is anchored to the current Google thresholds. Keep this table next to the dashboard config so panel thresholds and alert conditions never drift.
| Metric | Good (p75) | Needs Improvement (p75) | Poor (p75) | Unit in panel |
|---|---|---|---|---|
| LCP | ≤ 2.5 s | ≤ 4.0 s | > 4.0 s | seconds |
| INP | ≤ 200 ms | ≤ 500 ms | > 500 ms | milliseconds |
| CLS | ≤ 0.1 | ≤ 0.25 | > 0.25 | unitless |
LCP and INP are stored in milliseconds in most RUM schemas. Decide once whether each panel divides by 1000 in SQL or formats the unit in Grafana — the steps below keep raw milliseconds in SQL and let Grafana own the display unit, so the threshold values you type into Grafana stay in the panel’s display unit.
How to build the dashboard
Step 1 — Write the canonical p75 time-series query
Start with one query shape that every panel reuses. It buckets rows into Grafana’s time window with $__timeFilter, groups into $__interval-sized buckets via $__timeInterval, and computes the 75th percentile with quantile(0.75).
SELECT
$__timeInterval(event_time) AS t,
quantile(0.75)(value) AS p75
FROM rum_events
WHERE $__timeFilter(event_time)
AND metric = 'LCP'
AND device IN ($device)
AND country IN ($country)
AND route IN ($route)
GROUP BY t
ORDER BY t
Why: quantile(0.75) is ClickHouse’s reservoir-sampled percentile — fast and accurate enough for a dashboard. The $__timeInterval() macro emits a toStartOfInterval(...) that aligns buckets to the panel’s resolution, so the series stays smooth as the user zooms. $__timeFilter injects the dashboard’s time range, so you never hardcode a window. The three IN ($var) clauses are where the template variables plug in.
Step 2 — Create the template variables
Add three Query variables under Dashboard settings → Variables, each backed by a SELECT DISTINCT against the same table so options reflect real data.
-- $device
SELECT DISTINCT device FROM rum_events ORDER BY device
-- $country
SELECT DISTINCT country FROM rum_events WHERE device IN ($device) ORDER BY country
-- $route
SELECT DISTINCT route FROM rum_events
WHERE device IN ($device) AND country IN ($country) ORDER BY route
Why: chaining each variable’s query off the one above it makes them dependent, so picking mobile narrows the country list to countries that actually have mobile traffic. Enable Multi-value and Include All on each; set the “All” custom value so the IN (...) clause stays valid. With multi-value on, Grafana expands $device to a comma-separated quoted list, which is why the query uses IN, not =.
Step 3 — Build one time-series panel per metric
Create three time-series panels, one each for LCP, INP, and CLS, each using the Step 1 query with its metric = literal swapped. Set the panel’s Standard option → Unit: seconds (s) for LCP, milliseconds (ms) for INP, none for CLS. Because LCP/INP are stored in ms, set LCP’s unit to milliseconds (ms) too if you do not divide in SQL — keep SQL and unit consistent and type thresholds in that same unit.
Step 4 — Add Good/NI/Poor threshold bands and steps
In each panel’s Thresholds section, define steps at the Needs-Improvement and Poor boundaries, then enable Show thresholds → As filled regions to render the bands. For the LCP panel (display unit milliseconds):
Thresholds (mode: Absolute)
base → green (Good, ≤ 2500)
2500 → orange (Needs Improvement)
4000 → red (Poor)
Show thresholds: As filled regions + lines
Why: filled regions give an at-a-glance “are we in the green band” read without the viewer remembering the numbers. The step colors map exactly to the threshold table — green below 2500 ms, amber from 2500 to 4000 ms, red above. INP uses 200/500; CLS uses 0.1/0.25.
Step 5 — Add the current-p75 stat panel
Add a Stat panel that shows the single current p75 value for the selected metric and colors itself against the threshold. It reuses the query without time bucketing:
SELECT quantile(0.75)(value) AS p75
FROM rum_events
WHERE $__timeFilter(event_time)
AND metric = 'LCP'
AND device IN ($device)
AND country IN ($country)
AND route IN ($route)
Why: dropping the GROUP BY t collapses the window to one number — “p75 LCP over the current range.” Give the Stat panel the same threshold steps as Step 4 and set Color mode → Value, so the tile turns amber or red the moment the aggregate crosses a boundary. Set Graph mode → None for a clean number, or Area to show the trend behind it.
Step 6 — Define the Grafana-managed alert on p75 LCP
Create an alert rule (Alerting → Alert rules → New) with a ClickHouse query that returns a single instant p75 LCP value, then a threshold condition. The query intentionally widens the window to the last 30 minutes so a few slow samples do not flap the alert:
SELECT quantile(0.75)(value) AS p75_lcp
FROM rum_events
WHERE event_time >= now() - INTERVAL 30 MINUTE
AND metric = 'LCP'
AND device = 'mobile'
Set the alert condition to IS ABOVE 2500 (the LCP Good ceiling in milliseconds), with Pending period of 5 minutes so the breach must persist before firing. Why: the fixed now() - INTERVAL 30 MINUTE window makes the alert query independent of any dashboard time range (alerts evaluate headless), and the pending period plus a 30-minute aggregation window keep the p75 stable against noise. Scope to device = 'mobile' because mobile p75 is where LCP regressions show first.
Verifying it works
Confirm the build end to end:
- Variables resolve. Open the dashboard, pick
device = mobile,country = DE,route = /checkout. The country and route dropdowns should re-query and shrink to options that exist for that selection. - Bands render. Each time-series panel should show green/amber/red filled regions at the right boundaries; the p75 line should sit inside a band. Temporarily zoom to a known-bad period and confirm the line enters the amber band above 2500 ms for LCP.
- Stat panel matches. The Stat tile’s number should equal the p75 of the time-series panel over the same range; its color should match the band the line currently sits in.
- Alert query previews. In the alert rule editor, click Preview — it should return one numeric row. Force a breach by lowering the threshold to a value below current p75 and confirm the rule moves to
PendingthenFiringafter the pending period. - JSON imports. Export the dashboard JSON and confirm it re-imports cleanly into a fresh Grafana, with the datasource UID re-mapped.
A minimal panel excerpt of the exported dashboard JSON model, showing the threshold steps and the targeted ClickHouse query:
{
"type": "timeseries",
"title": "p75 LCP",
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "${DS_CLICKHOUSE}" },
"fieldConfig": { "defaults": {
"unit": "ms",
"thresholds": { "mode": "absolute", "steps": [
{ "value": null, "color": "green" },
{ "value": 2500, "color": "orange" },
{ "value": 4000, "color": "red" }
] },
"custom": { "thresholdsStyle": { "mode": "area" } }
} },
"targets": [ { "rawSql":
"SELECT $__timeInterval(event_time) AS t, quantile(0.75)(value) AS p75 FROM rum_events WHERE $__timeFilter(event_time) AND metric = 'LCP' AND device IN ($device) AND country IN ($country) AND route IN ($route) GROUP BY t ORDER BY t",
"format": "time_series" } ]
}
Edge cases & gotchas
IN ($var)with “All” selected. If a variable’s “All” maps to a literal that does not appear in the column, theINclause returns nothing. Set the variable’s custom all-value to a glob and usematch(device, '$device'), or define “All” to expand to every distinct value rather than a single token.- CLS stored as a ratio, charted as ms. A copy-pasted panel that inherits LCP’s
msunit will display CLS as milliseconds. Always reset the CLS panel unit tononeand its thresholds to 0.1/0.25. - Sparse routes flap the alert. A low-traffic route can yield a p75 from three samples. Add
HAVING count() >= 50to the alert query (or gate on a minimum sample count) so a thin window cannot trip it — this ties directly to your sampling rate and how it skews tail percentiles. - Mismatched percentile semantics. ClickHouse
quantile()is approximate; for an alert you want reproducibility, switch the alert query toquantileExact(0.75)so the number is deterministic across evaluations even though it is slightly slower. $__timeIntervalvs$__interval. The ClickHouse plugin’s macro is$__timeInterval(col), which wraps the column in a bucketing function — do not paste Grafana’s generic$__intervalstring variable into the SQL; it is a duration literal, not a bucketing expression.
FAQ
Should I chart p75 or p95 for Core Web Vitals?
p75. Google’s CrUX program and the Good/NI/Poor thresholds are all evaluated at the 75th percentile, so your dashboard and SLOs must use the same percentile to be comparable. Track p95 in a secondary panel for tail visibility, but keep p75 as the headline.
Why does my LCP panel show values like 2500 instead of 2.5?
LCP and INP are usually stored in milliseconds. Either divide by 1000 in SQL and set the unit to seconds, or keep milliseconds in SQL and set the panel unit to ms — but type the threshold steps in whichever unit the panel displays, or the bands will land in the wrong place.
Why use a 30-minute window for the alert but the dashboard time range for panels?
Grafana-managed alerts evaluate on a schedule with no dashboard context, so the alert query must define its own fixed window. A 30-minute aggregate plus a 5-minute pending period smooths out a handful of slow samples and prevents flapping while still catching sustained regressions.
Related
- Grafana Dashboards for Web Performance — the parent overview of dashboarding RUM data and the read layer this page builds on.
- Self-Hosted RUM Pipeline with ClickHouse — the storage and p75 rollup layer these panels query.
- RUM Data Sampling Strategies — how sampling shapes the p75 your panels and alerts read.