When Cristiano Ronaldo lines up a free kick in a World Cup quarter-final, something quietly enormous is happening away from the pitch. Across six continents, phones come out of pockets, TVs switch on, and laptops get tilted toward a couch. During the biggest fixtures, the broadcasters and streaming partners behind a single match — FIFA+, UEFA TV, regional rights-holders, the big over-the-top platforms — can be serving tens or even hundreds of millions of people watching the same thirty seconds of football at the same moment.
The interesting question for an engineer isn't how the goal gets filmed. It's how that one feed reaches a hundred million screens without melting the internet. Send a video stream directly from the stadium to every viewer and you'd need a single server with hundreds of terabits per second of uplink, which does not exist. The actual answer is a distributed pipeline with about eight distinct stages, each one solving a different physics problem, and most of the clever parts happen in places nobody watching ever thinks about.
This is a walkthrough of that pipeline, stage by stage — capture, encoding, packaging, origin, CDN, adaptive bitrate, traffic steering, and security. We'll use a football broadcast as the running example because it's the hardest version of the problem: live, unpredictable, latency-sensitive, and watched by more people simultaneously than almost anything else humans produce. If you understand how a match gets to a hundred million phones, you understand how every live stream you've ever watched works.
1 One Feed, a Hundred Million Screens
Start with the constraint, because the whole architecture is a response to it. A football stream watched by 100 million people at an average of 5 Mbps is, in aggregate, roughly 500 terabits per second of video moving across the planet at peak. No single machine, no single data centre, and no single network link can carry that. The design has to spread the load until no individual component is doing anything impossible.
It does that with a pipeline. The match is captured once, compressed into several quality levels, chopped into thousands of tiny files, published to an authoritative origin, and then copied outward through a global network of edge servers that sit close to viewers. By the time the video reaches you, it has travelled through five or six independent systems — and the load at each step has been divided so finely that no part of it is straining.
The end-to-end live pipeline
2 It Starts in the Stadium
A major football broadcast is filmed with anywhere from thirty to fifty cameras — touchline tracking cameras, the high tactical cam, goal-line and net cams, ultra-slow-motion rigs for replays, a drone or cable-cam for the wide beauty shots. Every one of those feeds runs back to a production control room, usually an outside-broadcast truck parked behind the stand, where a director cuts between angles live.
That mixing is the part most people picture when they think about “the broadcast,” but for our purposes it collapses into a single fact: the dozens of camera feeds become one clean program feed. From the moment the director cuts it, there is exactly one video to deliver — and everything downstream exists to copy that one video to the world. The engineering problem isn't the cameras. It's what happens to that single feed after the truck.
3 Encoding and the Bitrate Ladder
The clean program feed is huge. Uncompressed 4K can run to several gigabits per second — fine inside a truck connected by fibre, impossible to send to a phone on the train. So the first real transformation is encoding: a hardware or software encoder compresses the feed using a video codec such as H.264 (AVC), H.265 (HEVC), or increasingly AV1. In the broadcast world this runs on dedicated gear from Harmonic, Ateme, or AWS Elemental MediaLive; the underlying compression is the same maths an ffmpeg pipeline does, just hardened for 24/7 live.
The crucial trick is that the encoder doesn't produce one output. It produces several — the same match at a range of resolutions and bitrates, all in lockstep. This set is the bitrate ladder, and it's what lets a fibre-connected TV and a phone on patchy 4G watch the same game at the quality each can actually sustain.
| Rung | Resolution | Bitrate | Who gets it |
|---|---|---|---|
| 2160p | 4K UHD | 15–25 Mbps | Smart TV on fast fibre |
| 1080p | 1920×1080 | 6 Mbps | Good home broadband |
| 720p | 1280×720 | 3 Mbps | Average connection |
| 480p | 854×480 | 1.2 Mbps | Phone on mobile data |
| 240p | 426×240 | 0.4 Mbps | Weak or congested signal |
Producing every rung simultaneously is the foundation of adaptive bitrate streaming — the player will later hop between these rungs as your network changes. But the ladder is useless on its own. The video still has to be cut into a shape a CDN can cache and a player can stream. That's packaging.
4 Packaging: One Stream Becomes Thousands of Tiny Files
Here is the idea that makes internet-scale streaming work, and it's almost embarrassingly simple: don't stream a video at all. Cut each rung of the ladder into a long sequence of short files — typically 2 to 6 seconds each — and let the player download them one after another like any other web resource. A two-hour match becomes a few thousand little segments per rung.
The two dominant formats for describing those segments are HLS (Apple's HTTP Live Streaming) and MPEG-DASH. They're the same idea expressed two ways. Both hand the player a text manifest that lists the available qualities and the segments inside each. In HLS the top-level manifest — the master playlist — looks like this:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=25000000,RESOLUTION=3840x2160,CODECS="hvc1.2.4.L153.B0"
uhd_2160p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080,CODECS="avc1.640028"
hd_1080p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720,CODECS="avc1.4d401f"
hd_720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1200000,RESOLUTION=854x480,CODECS="avc1.4d401e"
sd_480p/index.m3u8Each line points the player at a media playlist for that rung, which is just a rolling list of the most recent segments. For a live stream it gets rewritten every few seconds as new segments are produced and old ones fall off the end:
#EXTM3U
#EXT-X-VERSION:9
#EXT-X-TARGETDURATION:4
#EXT-X-MEDIA-SEQUENCE:2680
#EXTINF:4.000,
segment_2680.m4s
#EXTINF:4.000,
segment_2681.m4s
#EXTINF:4.000,
segment_2682.m4sThe payoff of this design is that there is nothing special about a video segment any more — it's a static file fetched over ordinary HTTP. And static files over HTTP are the one thing the internet already knows how to deliver at planetary scale, because that's exactly what a CDN is built to cache. Modern packaging often uses a shared container called CMAF so the same segment files can serve both HLS and DASH players, halving the storage and cache footprint.
5 The Origin (and Why It Has a Bodyguard)
The packaged segments and manifests are published to an origin server — the single authoritative source of truth for the stream. If a piece of video exists anywhere in the world, it traces back to here. For a global event the origin is never one box; it's a redundant set, usually mirrored across regions and paired with a disaster-recovery origin in case a whole site goes dark mid-match.
But the origin has a problem it cannot solve alone: it must never be hit directly by the crowd. If even a fraction of 100 million players asked the origin for segments, it would collapse in seconds. So the origin sits behind a shield — an intermediate caching tier (sometimes called an origin shield or mid-tier) whose only job is to absorb requests so that, no matter how many viewers there are, the origin itself sees only a small, steady trickle of traffic. The shield is the seam between the part of the system that produces video and the part that distributes it. Everything past it is the CDN.
6 The CDN Does the Heavy Lifting
The content delivery network is the part of this whole story that actually makes a hundred million viewers possible, and it works on one principle: cache the segment once, serve it a million times. A CDN is thousands of edge servers in hundreds of cities. When the first viewer in Mumbai requests segment 2681, the nearby edge server fetches it from the origin (through the shield) and keeps a copy. The next hundred thousand viewers in and around Mumbai get that same copy straight from the edge, a few milliseconds away, and the origin never hears about them.
Because every viewer in a region wants the exact same segment at nearly the same instant, the cache hit rate for live sports is extraordinary — well above 99% at the edge. That's the whole magic trick. The origin produces each segment once; the edge fleet multiplies it out to the world. Akamai, Amazon CloudFront, Fastly, Cloudflare, and Google's network are the names doing this at the scale a World Cup needs.
Serious events don't even trust a single CDN. They run multi-CDN: the same content is published to several providers at once, and a traffic-steering layer shifts viewers between them based on real-time health and cost. If Akamai degrades in one region, traffic slides to Fastly or CloudFront mid-match without anyone watching noticing a thing. For the biggest fixtures, multi-CDN isn't an optimisation — it's the difference between a wobble and a worldwide outage at the 90th minute.
Why the CDN prevents the origin from collapsing
7 How Your Player Picks a Quality
All that effort producing a bitrate ladder pays off in the player, and it pays off automatically. As the player downloads each segment, it measures how long the download took relative to the segment's duration — effectively measuring your real throughput. If 6 Mbps segments keep arriving comfortably ahead of when they're needed, the player tries the next rung up. If a segment arrives late and the buffer starts draining, it drops to a lower rung at the next boundary.
That is why a stream visibly softens from sharp 1080p to fuzzier 720p when someone in the house starts a big download, and then quietly sharpens again a minute later — instead of freezing on a spinner. The buffer is the shock absorber: a player typically keeps a few segments queued ahead so a momentary dip never reaches your eyes. Pure adaptive bitrate, decided segment by segment, is what keeps a live match watchable across wildly different networks at the same instant.
8 Steering Traffic Across the Planet
There's a brutal traffic pattern unique to live sports: almost everyone joins in the same five minutes before kickoff. A video-on-demand service sees viewers trickle in all day; a cup final sees tens of millions of players hit the system in one synchronised spike. The infrastructure has to be steered, not just provisioned.
Three mechanisms do the steering. DNS-based and anycast routing sends each viewer to a nearby, healthy point of presence rather than a fixed address. Global load balancers spread requests across regions so no single cluster takes the whole spike. Geographic routing keeps European viewers on European edges and Asian viewers on Asian edges, both for latency and for the broadcast-rights boundaries we'll get to. Underneath, the edge fleet itself auto-scales to meet the surge. The goal is that a viewer in São Paulo and a viewer in Seoul both get served locally, and neither ever knows the other exists.
9 The Latency Problem — and Where WebRTC Fits
The segment model has one ugly side effect: latency. Because the player waits for whole segments and keeps a buffer, traditional HLS and DASH typically run 20 to 45 seconds behind real life. For a film that doesn't matter. For football it's a disaster — your neighbour's TV erupts at a goal half a minute before yours does, and live betting is impossible.
The mainstream fix is to keep the CDN model but make the segments thinner. Low-Latency HLS and chunked CMAF break each segment into smaller partial chunks the player can start fetching before the full segment even finishes encoding, pulling latency down to roughly 2 to 5 seconds while still riding the same cacheable HTTP infrastructure. This is where most premium live sports sits today.
For genuinely sub-second needs — live betting, watch-along rooms, interactive shows — platforms reach for WebRTC, which is built for real-time media. But WebRTC is the wrong tool for the main broadcast, and the reason is structural. Every WebRTC viewer is an individually negotiated, stateful connection that can't be cached the way a static HTTP segment can:
HTTP segment: encode once -> cache once -> serve 100,000,000× (cheap)
WebRTC stream: negotiate + push a live connection per viewer (expensive)So the pattern at scale is hybrid: the millions watching the broadcast get low-latency HLS or DASH through the CDN, and only the slice of viewers on a truly interactive surface get WebRTC. If you want the full mechanics of how WebRTC negotiates those connections, we wrote a separate beginner deep-dive on WebRTC architecture — SDP, ICE, STUN/TURN, and signaling.
| Approach | Typical latency | Scales to millions? | Where it fits |
|---|---|---|---|
| Standard HLS / DASH | 20–45 s | Yes (CDN) | VOD, non-urgent live |
| Low-Latency HLS / CMAF | 2–5 s | Yes (CDN) | Modern live sports |
| WebRTC | < 0.5 s | No (expensive past ~thousands) | Betting, watch-along, interaction |
10 Keeping It Locked: DRM, Tokens, and Anti-Piracy
Premium sports rights cost billions, so the same infrastructure that delivers the stream also has to defend it. Four layers stack on top of each other, and each one assumes the others can be beaten.
DRM encryption
The segments are encrypted, and only a licensed player holding a valid key can decode them. The three systems that cover almost every device are Widevine (Android, Chrome), PlayReady (Windows, Xbox, many smart TVs), and FairPlay (Apple). The browser side of this is the Encrypted Media Extensions API, which hands the encrypted stream to the device's secure decode path.
Short-lived signed URLs
Manifests and segments aren't served from guessable, permanent links. Each request carries a token that expires in minutes, so a URL scraped and pasted into a piracy site stops working almost immediately:
https://edge.cdn.example/live/match/720p/segment_2681.m4s
?token=eyJhbGciOiJIUzI1NiJ9... // signed, account-bound
&exp=1718900000 // valid for a few minutesGeo-restriction
Broadcast rights are sold by territory, so a match streamable in one country may be blacked out in another. The same geographic routing that improves latency also enforces those contractual boundaries at the edge.
Forensic watermarking
An invisible, per-session identifier is embedded into the video itself. If a stream is illegally rebroadcast, the rights holder can extract that watermark and trace the leak back to the exact account it came from — which is how takedowns during a live match actually happen.
Putting Real Numbers on It
It's worth doing the arithmetic that started this whole design, because the scale is genuinely hard to picture. Take 100 million concurrent viewers at an average sustained bitrate of 5 Mbps:
100,000,000 viewers × 5 Mbps = 500,000,000 Mbps
= 500,000 Gbps
≈ 500 Tbps delivered, at peak, all at onceThat number is an aggregate spread across multiple CDNs and tens of thousands of edge servers — it never travels through any single pipe, and the real figure swings with how many viewers sit on 4K versus 480p. But it shows why every architectural decision in this post exists. Encode once instead of per viewer. Cache at the edge instead of serving from origin. Split across CDNs instead of trusting one. Each choice exists to make sure no individual component is ever asked to do something physically impossible.
And none of it runs blind. Operations teams watch thousands of live signals during a match — concurrent viewers, per-CDN error rates, edge cache-hit ratios, rebuffering ratio, average bitrate, startup time, and playback failures — on dashboards built from tools like Grafana, Prometheus, Datadog, and streaming-specific QoS platforms such as Conviva and Mux Data. When a region starts rebuffering, the steering layer shifts traffic to a healthier CDN before most viewers notice. At this scale, observability isn't a nice-to-have; it's the only thing standing between a wobble and a trending outage.
The Stack at a Glance
Pulling the whole pipeline together, here is the kind of tech stack a major live-sports platform is running on a big night:
- Encoding & processing — AWS Elemental, Harmonic, Ateme, ffmpeg-based pipelines; H.264 / H.265 / AV1.
- Packaging & protocols — HLS, MPEG-DASH, CMAF, Low-Latency HLS.
- Origin & storage — multi-region origins with an origin shield; S3 / Google Cloud Storage for the VOD and replay copies.
- Delivery — multi-CDN across Akamai, CloudFront, Fastly, Cloudflare, with real-time traffic steering.
- Security — Widevine / PlayReady / FairPlay DRM, signed URLs, geo-restriction, forensic watermarking.
- Observability — Grafana, Prometheus, Datadog, Conviva, Mux Data for QoS and per-CDN health.
Frequently Asked Questions
- Why can't a platform just send one video stream to every viewer?
- Because a single server has a finite uplink. One origin pushing 100 million copies of a 5 Mbps stream would need around 500 Tbps of egress — beyond any one machine or data centre. A CDN solves it by producing each segment once and letting thousands of edge servers cache and re-serve it to the viewers nearest them. The origin sees a trickle; the edge fleet handles the millions.
- What is adaptive bitrate streaming?
- The encoder produces the match at several quality levels at once — 4K, 1080p, 720p, 480p — each cut into 2–6 second segments. The player measures your real bandwidth and requests whichever quality fits, switching at segment boundaries. That is why a stream softens to 720p when your connection dips instead of freezing to buffer.
- How is live sports streaming kept low-latency?
- Standard HLS and DASH run 20–45 seconds behind because the player waits for whole segments. Low-Latency HLS and chunked CMAF break segments into partial chunks the player can fetch early, pulling latency to roughly 2–5 seconds while staying on the CDN. Sub-second use cases like betting use WebRTC, but only for the viewers who genuinely need it.
- Why isn't WebRTC used to stream to 100 million viewers?
- WebRTC is built for real-time, individually negotiated connections — great for a call or a small interactive room. But each viewer is a stateful connection that can't be cached like a static HTTP segment, so fanning it out to millions is expensive. The main broadcast goes over HLS/DASH on a CDN, which is cached once and served a million times.
- How do platforms stop piracy of live sports?
- Layers stack together: DRM (Widevine, PlayReady, FairPlay) encrypts the media so only licensed players decode it; short-lived signed URLs make copied links expire within minutes; geo-restriction enforces broadcast-rights boundaries; and forensic watermarking embeds a per-session ID so an illegal re-stream can be traced to the account it leaked from.
About the author — Kishan Vaghani
Kishan is the founder of ShareCode and writes about the engineering and infrastructure decisions behind real-time and large-scale systems. ShareCode itself leans on Firebase and CRDT-based sync rather than a CDN video pipeline, but the same questions — caching, fan-out, latency budgets, where to put the bottleneck — show up the moment any system has to reach a lot of people at once.
Final Thoughts
The next time you press play on a match and Ronaldo's free kick arrives on your screen a few seconds after he strikes it, it's worth remembering how far that frame has travelled. Stadium cameras, a production truck, an encoder building a ladder of qualities, a packager slicing it into thousands of little files, a shielded origin, a multi-CDN edge network fanning it out across continents, a player quietly choosing the best quality your connection can hold, and a stack of DRM and tokens guarding it the whole way.
What looks like a simple video player is one of the most sophisticated distributed systems people use casually every day. And the core ideas behind it aren't exotic — encode once, cache at the edge, adapt to the network, never let one component do something impossible. Those same principles scale down to far smaller systems too. If you're curious about the real-time side of the problem — the sub-second, peer-to-peer world rather than the cache-and-fan-out one — the WebRTC architecture guide and our writeup of how real-time sync works under the hood are the natural next reads.
References & Sources
The primary sources, specifications, and documentation behind this article. Each link opens in a new tab.
- RFC 8216 — HTTP Live Streaming (HLS)
IETF · 2017
The protocol behind most live video on the web — master playlists, media playlists, and the segment model that makes adaptive streaming work over plain HTTP.
rfc-editor.org - Enabling Low-Latency HLS
Apple
The partial-segment and blocking-playlist additions that pull HLS latency down from ~30 seconds to the 2–5 second range live sports needs.
developer.apple.com - 23009-1 — Dynamic Adaptive Streaming over HTTP (MPEG-DASH)
ISO/IEC · 2022
The vendor-neutral counterpart to HLS. Same chunked, adaptive idea, expressed through an XML manifest (the MPD) instead of an .m3u8.
iso.org - 23000-19 — Common Media Application Format (CMAF)
ISO/IEC · 2020
The container that lets one set of segments serve both HLS and DASH, and the basis for chunked low-latency delivery.
iso.org - Media Source Extensions (MSE) & Encrypted Media Extensions (EME)
W3C
The browser APIs every web player is built on — MSE feeds segments into the <video> element, EME hands keys to the DRM module.
w3.org - Guidelines for Implementation (DASH-IF IOP)
DASH Industry Forum
Practical interoperability guidance used across the streaming industry — bitrate ladders, low-latency profiles, multi-DRM packaging.
dashif.org - Widevine DRM — Architecture Overview
Google
How license requests, content keys, and security levels (L1/L2/L3) protect premium streams on the most common Android and Chrome path.
developers.google.com
About the writers
Founder of ShareCode. Writes the engineering deep-dives on this site — WebRTC, Firebase Auth, real-time sync, and the production patterns behind the editor itself.
More from Kishan
Developer educator at ShareCode. Writes the tutorial track — Python, JavaScript debugging, coding-interview prep, and the everyday code-quality habits that hold up in real codebases.
More from Kajal
Try it now
Sketch a streaming pipeline in a shared code space
Diagramming a system like this lands faster when two people draw it together. Paste the HLS manifest from §4 into a ShareCode editor, share the link, and walk a teammate through the segment model live — the quickest way to actually internalise how adaptive streaming fits together.
Open a code space →