Most WebRTC deployments start the same way. You spin up a Selective Forwarding Unit, put it behind a load balancer, deploy a few regional instances, and call it distributed. It works fine until it does not. Then it fails in three specific ways that a bigger server will never fix.

This post is the engineering breakdown of how we moved from that architecture to a globally distributed mesh where sessions span multiple SFU nodes across regions, state synchronizes through a pub-sub layer instead of a database, media relays over a custom protocol instead of WebRTC, and node failures resolve through live ICE restarts in under one second rather than full reconnections that take five.

Why Single-Home SFU Architecture Has a Hard Ceiling

The standard deployment pattern for WebRTC infrastructure is what the industry calls single-home. Multiple SFU instances run in data centers around the world. Connections load-balance across them. But each session is pinned to exactly one server. Every participant in that session must connect to that specific instance regardless of where they are in the world.

Three problems emerge from this constraint, and they get worse as session scale increases.

Bandwidth ceiling is a hard physics problem. An SFU in single-home mode must forward every track to every subscriber from a single network interface. Model a 50-participant session where each participant receives video from all 49 others at 360p and 500kbps:

50 participants × (500 kbps × 49 streams) = 1,225,000 kbps ≈ 1.23 Gbps outbound

That is before CPU overhead for SRTP encryption on every outbound packet. A typical 10Gbps NIC sustains roughly 8Gbps of usable throughput under real conditions. A session with 100 participants at the same quality settings exceeds 4.9 Gbps outbound from a single node. You cannot network-card your way out of this.

NACK round-trip time degrades quality predictably with geographic distance. WebRTC uses UDP rather than TCP specifically to avoid head-of-line blocking. The tradeoff is that lost packets require explicit repair through the NACK mechanism: the receiver sends a Negative Acknowledgement to the sender, the sender retransmits from its buffer, and playback resumes. The entire sequence must complete within the receiver's playout buffer window, typically 100-150ms.

When a participant in Karachi is connected to an SFU in Virginia, the NACK round-trip alone is 250-300ms. The stream degrades before the repair completes. The user sees freezes, artifacts, and audio gaps that no amount of application-layer buffering fully compensates for. Geographic distance is not a configuration problem. It is a routing problem that requires topology changes to fix.

A single node failure is a total session failure. SFU sessions are stateful. Participant lists, track subscriptions, and media forwarding state all live in process memory. When the process dies, the state is gone. Every participant currently connected to that node loses their session entirely and must reconnect from scratch, a process that involves a full ICE negotiation and SDP exchange and realistically takes 3-5 seconds per client.

Deploy Infrastructure That Doesn't Drop the Ball

Start Building with RTC LEAGUE
CTA Illustration

Redesigning the Session as a Logical Construct

The architectural shift that makes distributed WebRTC work is recognizing what a session is. In single-home deployments, a session is a physical construct: a Go struct, a heap allocation, a data structure that lives on one machine. In a multi-home architecture, a session becomes a logical construct that multiple nodes co-host simultaneously.

This requires decoupling two things that single-home architectures treat as inseparable: participant metadata and media transport.

Participant metadata covers identity, connection state, and the track manifest (what tracks a participant has published). This data is small, roughly 100 bytes per participant, and must be visible to every node co-hosting the session so any node can correctly forward presence updates to its local participants.

Media transport is local to the node where the participant is actually connected. Each node handles SRTP encryption, RTP forwarding, and WebRTC PeerConnection management only for the participants physically connected to it.

The session structure on each node maintains two participant lists:

Code Snippetjavascript
go
type Session struct {
    localParticipants  []LocalParticipant  // connected to this node
    remoteParticipants []Participant        // connected to other nodes
}
LocalParticipant inherits from Participant and adds transport-specific methods:
go
type Participant interface {
    Metadata() *ParticipantInfo
}

type LocalParticipant interface {
    Participant
    GetDownTrack(trackID string) DownTrack
    SendUpdate(info *ParticipantInfo)
}

Each node forwards media and metadata updates only for its local participants. Remote participant state arrives via the state synchronization layer. The node has read visibility into the full session but writes authority only over its own participants.

State Synchronization: Why We Chose a Message Bus Over a Database

The natural engineering instinct for synchronizing shared state across distributed nodes is a database. We evaluated this path thoroughly before rejecting it.

Postgres and traditional RDBMS fail the availability requirement. Even with read replicas and standby instances, a primary failure blocks writes. Cross-datacenter synchronization of traditional databases introduces replication lag that exceeds what real-time presence updates can tolerate.

Distributed databases like Google Spanner and CockroachDB solve the cross-region problem but make the wrong consistency tradeoff. Both are optimized for CP (consistency and partition tolerance) under CAP theorem. During a network partition between regions, Spanner blocks writes rather than serving potentially stale reads. In a real-time communication system where availability is non-negotiable, this behavior is unacceptable. A participant cannot wait for a distributed transaction to commit before their join event propagates to other nodes.

What we actually need is AP with eventual consistency. The participant state in a real-time system is transient: it only exists and matters while the participant is connected. We do not need durability guarantees, ACID semantics, or write conflict resolution. We need fast, reliable pub-sub distribution of state changes across all co-hosting nodes.

Each node acts as the authoritative writer for its own local participants. When participant state changes (join, leave, publish track, mute), the owning node publishes to a session-scoped topic on the message bus. All co-hosting nodes subscribe to that topic and receive read-only updates.

Code Snippetjavascript
Node A (Singapore) publishes: participant_joined{id: "user_123", tracks: [...]}
Node B (Frankfurt) receives update, appends to remoteParticipants
Node B's local participants receive metadata update for user_123

There are no write conflicts because no two nodes ever write the same participant's state. There is no distributed locking because only one node ever holds write authority for a given participant. State re-sync on a node restart is handled by the bus itself: the new node publishes a resync request, and all other nodes republish their local participant states in response.

Media Relay: Why We Did Not Use WebRTC Between Nodes

Once state synchronization was solved, we needed a protocol for relaying media tracks between nodes. The obvious choice was WebRTC itself since our SFUs already use it for client-facing transport. We evaluated it and rejected it for three specific reasons:

ICE adds unnecessary overhead for server-to-server paths. ICE was designed to traverse NAT and firewalls in unpredictable consumer network topologies. Server-to-server communication runs on known infrastructure with controlled network paths. Running ICE negotiation between SFU nodes adds 200-500ms of connection establishment time with no benefit.

SDP is operationally painful for programmatic control. Session Description Protocol works for human-negotiated peer connections but is unnecessarily complex for automated server-to-server track relay where the parameters are fully known in advance.

Simulcast forwarding requires separate coordination logic. WebRTC simulcast between client and server involves RID-based track identification and subscriber-driven layer selection. Extending this to relay forwarding between servers requires additional signaling that does not map cleanly to the existing WebRTC negotiation model.

Instead, we built a custom FlatBuffers-based protocol that carries RTP media packets between nodes. FlatBuffers provides zero-copy deserialization, which matters at the packet processing rates of a production SFU. The custom protocol adds metadata fields to RTP packets including track identifiers and source routing information. It also provides uniform packet loss handling across video, audio, and data channels, which WebRTC does not.

The receive-side logic on each node determines which tracks it needs based on its local participants' subscription states, then performs service discovery to identify which nodes hold those tracks. Track origin nodes respond with capacity and routing information. Each receiving node makes independent routing decisions without coordinating through a central scheduler, which is the property that makes the system fault-tolerant rather than fragile.

Deploy Infrastructure That Doesn't Drop the Ball

Start Building with RTC LEAGUE
CTA Illustration

Geographic Routing and Points of Presence

Reaching the target of sub-100ms network latency for every participant in the world requires a globally distributed PoP topology. Each PoP is self-contained: it does not depend on shared infrastructure outside its own datacenter for normal session operation. The message bus and media relay state for a PoP are isolated within it. The only cross-PoP communication is media relay for sessions that span regions.

For participant-to-node routing, we use latency-aware ICE candidate selection. During the ICE negotiation phase, clients measure round-trip time to multiple candidate nodes and the signaling layer selects the lowest-latency option rather than the geographically nearest one. Geographic proximity correlates with low latency in most cases but not all. BGP routing anomalies can make a physically nearby node slower than a more distant one. Measuring actual RTT during ICE produces consistently better results than IP geolocation.

For cross-region mesh links, the overlay network encrypts and routes all traffic regardless of the underlying cloud provider's physical topology. The system runs across multiple cloud providers simultaneously, and the overlay layer makes every node addressable from every other node without cloud-specific peering configurations.

Live Migration: Reducing Node Failure Impact from 5 Seconds to Under 1 Second

Standard WebRTC reconnection after a node failure requires the client to tear down its PeerConnection, re-establish the signal connection, perform a full ICE negotiation with the new node, and rebuild all track subscriptions from scratch. On mobile connections, this process takes 3-5 seconds. The user's application UI changes substantially during that window.

RTC LEAGUE eliminated this through live migration using WebRTC's ICE restart mechanism. An ICE restart allows an existing PeerConnection to migrate to a new network path without teardown. The client keeps its PeerConnection object, its track subscription state, and its application-level UI state intact.

The migration process: the new target node receives the migrating participant's expected state from the session metadata layer, reconstructs its local session state to exactly match what the client believes it is connected to, including all active track subscriptions, and then completes the ICE restart pointing the client to the new node.

From the client's perspective, the PeerConnection went through a brief ICE restart, which is indistinguishable from a network path change like switching from WiFi to cellular. The disruption window in production is under one second.

The same mechanism enables proactive migration. When monitoring detects a node approaching capacity or showing early hardware degradation signals, participants migrate to healthy nodes before any failure occurs, resulting in zero user-visible disruption.

What the Full Architecture Looks Like

Layer

Mechanism

Key Property

Session state sync

Pub-sub message bus

AP consistency, no cross-DC dependency

Media relay

Custom FlatBuffers + RTP

Zero-copy, uniform loss handling

Participant routing

Latency-aware ICE selection

RTT-optimized, not geo-naive

Node failure recovery

ICE restart live migration

< 1s disruption, no client teardown

Cross-cloud routing

Overlay network

Cloud-agnostic addressability

Simulcast optimization

Per-node layer selection

Inter-node BW proportional to pairs not participants

Each datacenter operates independently. Sessions span datacenters through point-to-point mesh links. Any single node failure, datacenter failure, or cloud provider failure leaves the session alive on remaining nodes. Participants on failed nodes reconnect through live migration rather than full session re-establishment.

This is the architecture behind managed WebRTC infrastructure that targets 99.99% availability at global scale.

Deploy Infrastructure That Doesn't Drop the Ball

Start Building with RTC LEAGUE
CTA Illustration