Lighthouse runs from a Chicago datacenter on a throttled 4G profile. Your users are not in a Chicago datacenter and most of them are not on a 4G phone. Real-user monitoring closes the gap between lab numbers and what the people paying you actually experience, and the implementation is shorter than people expect — about thirty lines of browser code, a small POST handler, and one percentile query.
The Performance Observer API in 30 lines
Every modern browser exposes a single API, PerformanceObserver, that emits the three Core Web Vitals entries — LCP, CLS, and INP (via first-input and event) — as soon as the browser has computed them. The code that wires it up looks like this:
// rum.js — load this in <head> with the defer attribute
const endpoint = "/api/rum";
const seen = new Set();
function ship(metric, value) {
// dedupe identical metric+value combinations within the same page
const key = metric + ":" + value;
if (seen.has(key)) return;
seen.add(key);
const body = JSON.stringify({
metric,
value: Math.round(value),
page: location.pathname,
device: navigator.userAgentData?.mobile ? "mobile" : "desktop",
connection: navigator.connection?.effectiveType,
ts: Date.now(),
});
// sendBeacon never blocks the main thread and survives page unload
navigator.sendBeacon(endpoint, body);
}
const obs = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.entryType === "largest-contentful-paint") {
ship("LCP", entry.startTime);
} else if (entry.entryType === "layout-shift" && !entry.hadRecentInput) {
ship("CLS", entry.value);
} else if (entry.entryType === "first-input") {
ship("INP", entry.processingStart - entry.startTime);
}
}
});
obs.observe({ type: "largest-contentful-paint", buffered: true });
obs.observe({ type: "layout-shift", buffered: true });
obs.observe({ type: "first-input", buffered: true });
The buffered: true flag is important. Without it, the observer only sees entries emitted after it starts listening — which means you miss the LCP entry that fires before your JavaScript is parsed. With it, the browser replays the entries it has already recorded since navigation start.
The dedupe set matters because the browser can emit multiple LCP entries as larger elements appear. You only care about the final one, so most production setups debounce on a short timeout and ship the highest value, or — simpler — ship every entry and take the maximum at query time.
The endpoint
The server side is tiny. A POST handler that parses the body and writes one row. It can be a Vercel Edge function, a Firebase Callable, a Cloudflare Worker, anything that speaks HTTP and can append to storage. For sites under a few million pageviews a month you do not need a real database — a SQLite file with one table is enough:
// app/api/rum/route.js — Next.js App Router example
import { NextResponse } from "next/server";
import { db } from "@/lib/sqlite"; // any storage you trust
export async function POST(req) {
const body = await req.json();
if (!body.metric || typeof body.value !== "number") {
return NextResponse.json({ ok: false }, { status: 400 });
}
db.prepare(
"INSERT INTO rum (metric, value, page, device, connection, ts) " +
"VALUES (?, ?, ?, ?, ?, ?)"
).run(
body.metric,
body.value,
body.page,
body.device,
body.connection ?? null,
body.ts
);
return NextResponse.json({ ok: true });
}
Reject anything that does not match the schema. RUM endpoints are discoverable from the network tab and will attract junk traffic within hours of going live — validate the body, rate-limit per IP if you can, and never trust the page field for anything security-relevant.
What to actually compute
Means lie. The 75th percentile is the number Google publishes in the Core Web Vitals report, and it is the right number for the dashboard you check every day. The 95th percentile is the number to watch when you are doing capacity work or triaging tail-latency complaints — it tells you what your worst-served users are experiencing without being dragged around by genuine outliers the way a max would be.
- P75 by route — the headline number, one row per page or page group. Anything LCP over 2500ms, CLS over 0.1, or INP over 200ms is in the red zone by Google's thresholds.
- P95 by route — the tail. If P75 is healthy and P95 is 4x worse, you have a small subset of users on bad networks or low-end devices and the fix is usually different from a broad-spectrum optimisation.
- P75 by device class — mobile vs desktop almost always diverge by a factor of 2 on LCP. Track them separately.
- P75 by deploy — tag each entry with your build SHA. Regressions show up as a step change at the moment of deploy, which is exactly the signal you want for fast rollback.
What you'll see when you turn it on
The first week of RUM data on a site that has never had it is almost always surprising. A few of the patterns that show up reliably:
- LCP is bad on a route nobody has complained about — usually a secondary landing page or a settings screen with a heavy below-the-fold image that the browser is still pulling at LCP time.
- INP spikes the day after a deploy that shipped a heavier client component — typically a chart library or a date picker that loads synchronously on interaction.
- CLS jumps when an ad slot loads, or when a third-party widget (chat, cookie banner) injects DOM after the page is already painted. Both are fixable with reserved space, but you have to see the spike to know to fix it.
- Mobile LCP is 2-3x desktop, even though synthetic tests look similar. This is normal — real mobile devices are not the high-end phones the CI runner emulates.
Why third-party RUM is usually fine — but you can DIY
Vercel Analytics, SpeedCurve, Calibre, and the analytics products from every observability vendor (Datadog, New Relic, Dynatrace) all ship a RUM beacon and a dashboard. They cost between a few dollars and a few hundred dollars a month and they all use exactly the same Performance Observer API under the hood. The convenience is real: dashboards, alerts, change-tracking, and percentile math without touching a database.
The reasons to DIY anyway are simple. The data is yours — no third-party gets to see your URLs and user agents. The bill is fixed — a few cents of storage per million entries. And the implementation is short enough that you can keep all of it in your head, which means you understand what every number means instead of taking the vendor's definition on faith.
What to gate CI on
RUM and synthetic monitoring answer different questions. Use Lighthouse or WebPageTest in CI to catch regressions before they ship — a synthetic test runs on every PR, the numbers are reproducible, and failing the build on a 20% LCP increase prevents the bad deploy from ever reaching users.
Use RUM to know what your users actually experience. The two do not replace each other. A site that passes Lighthouse on every PR can still have terrible field LCP because real users are on slower devices than the lab profile assumes. A site with good field LCP can still ship a regression that synthetic catches before any user sees it. Run both.
The habit that compounds
The developers who ship faster sites are the ones whose dashboards show real users, not lab numbers. Lab numbers are easy to optimise for because the test conditions are fixed; field numbers force you to think about the long tail of devices, networks, and routes that synthetic tests will never reach. Wire up the thirty lines, watch the P75 for a month, and the conversation about performance changes shape — from "does Lighthouse pass" to "are users actually faster this week than last week".
Related reading
For the metric definitions themselves, start with Core Web Vitals explained. If your RUM dashboard is showing bad LCP, the next two stops are auditing your JavaScript bundle and paying less hydration tax. The full performance silo is on the web performance topic page.
About the writers
Founder of ShareCode. Writes the engineering deep-dives on this site — WebRTC, Firebase Auth, real-time sync, and the production patterns behind the editor itself.
More from Kishan
Developer educator at ShareCode. Writes the tutorial track — Python, JavaScript debugging, coding-interview prep, and the everyday code-quality habits that hold up in real codebases.
More from Kajal
Standing up RUM this week?
Drop the beacon code into a code space, share it with a teammate, and walk through the endpoint together. Most RUM setups go wrong at the schema validation step or the deduplication logic — a second reader catches both in minutes.
Open a code space →