CPaaS Had a Good Run. Now It's Showing Its Age.
If you've been in the communications industry for more than five years, you remember what it was like before CPaaS.
Building voice or SMS into a product meant negotiating directly with carriers, managing SIP infrastructure, dealing with telecom licensing, and spending months on plumbing before writing a line of actual product code. Twilio changed that. Pay per message, call an API, ship in days. For what it was solving, it was a genuinely good solution.
The CPaaS vendors built real businesses on the back of that value. Vonage, Sinch, Bandwidth, MessageBird, the model spread because the underlying problem was real and the solution worked. Transactional SMS, OTP verification, basic outbound voice. Clean, simple, priced by usage.
The issue isn't that CPaaS got worse. The issue is that the world moved and CPaaS didn't move with it. The infrastructure underneath most CPaaS products was designed for a specific kind of communication, transactional, asynchronous, channel-specific. What businesses are building today looks nothing like that, and forcing modern communication requirements through a 2013-era architecture creates the kind of technical debt that eventually demands a full rebuild.
That rebuild is happening. The category it's happening into is VPaaS.
Where CPaaS Actually Breaks
To understand why VPaaS exists, you need to understand the specific ways CPaaS falls short, not in theory, but in production.
The first place it breaks is media access. CPaaS gives you a channel. You can initiate a call, receive a call, send a message. What you cannot easily do is get direct, real-time access to the audio stream while the call is happening. For basic telephony, you never needed that. For AI voice, where an agent needs to process what the caller is saying and respond within milliseconds, not having that access is a fundamental architectural problem. You end up routing audio through external APIs, adding latency at every hop, and producing interactions that feel slow and mechanical no matter how good the underlying model is.
The second place it breaks is browser-native delivery. WebRTC lets audio and video travel directly in a browser without an app. CPaaS vendors have added WebRTC support, but there's a meaningful difference between WebRTC bolted onto a carrier-based stack and infrastructure built on WebRTC from the ground up. The difference shows up in latency, session stability, and call quality under load. Customers notice, maybe not consciously, but a slightly broken call experience erodes trust faster than almost any other product problem.
The third is composability. A telehealth product needs video, waiting rooms, provider routing, and session recording working together inside a single session. A contact center needs voice, real-time transcription, and CRM sync happening simultaneously. CPaaS gives you individual channels. Assembling those channels into a coherent product requires workarounds that compound over time.
And then there's pricing. Per-minute billing made sense when calls were occasional and transactional. When you're running thousands of AI voice interactions a day, carrier-style metering on every second of audio stops being a minor cost line and starts being a structural problem.
What VPaaS Actually Is
VPaaS stands for Video Platform as a Service. The name undersells it, this isn't just about adding video to your product.
VPaaS is infrastructure built natively on WebRTC and media servers rather than on carrier networks. Where CPaaS abstracts the carrier layer, VPaaS gives you control over the media itself. What happens to it, where it goes, what processes touch it, at what latency.
A VPaaS platform handles multi-party audio and video sessions, recording, real-time transcription hooks, AI agent integration inside live sessions, screensharing, browser-native delivery, and it exposes enough of the underlying media pipeline that developers can shape it around their specific product requirements rather than working within what the platform decides to expose.
LiveKit is currently the most capable open-source VPaaS infrastructure available. It's what a meaningful portion of the industry is building on. RTC LEAGUE runs managed LiveKit deployments for teams that need production-grade WebRTC infrastructure without the operational burden of running media servers in-house. The engineering team gets the platform capability, We handle server configuration, scaling, monitoring, and uptime.
The Three Shifts Making This Inevitable
The move from CPaaS to VPaaS isn't a marketing cycle. It's being driven by concrete changes in what communication products need to do.
AI moved inside the session. A few years ago, AI was adjacent to communication. A call would end, a webhook would fire, and some downstream process would handle the recording or transcript. The AI lived outside the conversation. Now businesses want AI operating inside the session while it's live, a voice agent that listens and responds, a transcription layer extracting structured data as the call happens, a tool that watches a conversation and surfaces relevant information mid-call. All of that requires direct media stream access from the first second of the session. CPaaS doesn't provide it cleanly. VPaaS does.
Browser-native became the baseline. Customers stopped accepting app installs as a precondition for communication. Click a link, join the session, that's the expectation now across telehealth, customer support, sales calls, and everything in between. WebRTC enables this, but production-grade WebRTC with consistent quality under load requires infrastructure built around it from day one. CPaaS vendors adding WebRTC on top of existing carrier infrastructure is not the same thing.
Products need communication to be composable. The products being built now don't just need a voice channel or a video channel. They need multiple communication capabilities working together inside a coherent product experience. That requires primitives you can compose, media stream access, routing logic, recording hooks, AI integration points, not just channels you can access through an API.
The Latency Problem That Kills AI Voice Products
There's one metric that determines whether an AI voice product actually works in production: round-trip latency.
For human-to-human calls, people adjust naturally. A 200ms delay on an international call is tolerable, everyone has experienced it. For AI voice, the tolerance is much tighter. When a customer asks a question and there's 500ms of silence before the AI starts responding, the interaction feels broken. The customer starts to distrust the system. Re-engagement from that point is hard.
Getting below 200ms round-trip in an AI voice session isn't a stretch goal, it's the line between a product that works and one that doesn't. And hitting it requires keeping the media path short. Audio from the browser to the media server, AI processing running close to the media, response delivered without routing through unnecessary carrier infrastructure.
The moment you route audio through a PSTN carrier, transcribe it via an external API, send it to a remote model, convert the response through TTS, and terminate it back through carrier infrastructure, you've added latency at five separate points. No LLM improvement recovers those milliseconds.
This is the specific constraint RTC LEAGUE's infrastructure is built around. The WebRTC stack, the media server configuration, the SIP trunking where legacy telephony needs to bridge in, all of it is engineered with latency as a hard requirement. TelEcho, RTC LEAGUE AI voice platform, runs on this stack. The latency it operates at is a function of the infrastructure decisions made underneath it, not a feature that was added on top.
CPaaS vs. VPaaS Side by Side
CPaaS | VPaaS | |
Infrastructure base | Carrier networks | WebRTC + media servers |
Typical latency | 150–400ms | Sub-100ms achievable |
Media stream access | Limited | Full, real-time |
AI in-session | Difficult, external | Native |
Browser-native | Partial | Full |
Pricing | Per minute / message | Concurrent connections |
Built for | SMS, OTP, basic PSTN | AI voice, video, real-time products |
What to Do If You're Making This Decision Now
CPaaS is still the right choice for specific, narrow use cases. SMS notifications, OTP flows, occasional outbound calls, the major providers do this well and price it fairly. If that's all you need, don't overcomplicate it.
But if your product involves AI agents operating inside live voice sessions, browser-native video, real-time media processing, or multi-party audio, and you're building that on CPaaS, you are accumulating architectural debt that will eventually force a rebuild. The only question is whether it happens before or after you've already scaled something that can't support what comes next.
VPaaS infrastructure, and specifically managed LiveKit, gives you the media layer those use cases need. RTC LEAGUE works with teams at this decision point, before the architecture gets locked in, and runs the infrastructure once it's deployed. If this is the conversation your team is having, talk to the RTC LEAGUE team before the decisions get made.
Where This Goes
CPaaS solved the right problem for its time: how do you add communication to software without becoming a telecom company? That problem is solved.
The problem businesses are solving now is different: how do you build software where communication is real-time, AI-native, and programmable from the ground up? CPaaS wasn't built for that question. VPaaS was.
The large CPaaS vendors are not ignoring this. Twilio, Vonage, and others are moving toward more programmable infrastructure and expanding their WebRTC capabilities. But they're doing it by extending platforms built on a different foundation, and the architectural constraints of that foundation don't disappear because a new feature layer was added on top.
Teams starting on VPaaS-native infrastructure today don't carry that constraint. RTC LEAGUE's stack, managed WebRTC, enterprise SIP trunking, AI voice through TelEcho, is built for where communication is going. If that's the infrastructure conversation you're having right now, let's talk.






