On March 16, 2026, Jensen Huang walked onto the stage at SAP Center in San Jose in front of 30,000 people and said something that landed less like an announcement and more like a mandate.
"Every company in the world today needs to have an OpenClaw strategy, an agentic system strategy. This is the new computer."
He did not hedge. He did not say "eventually" or "companies should consider." He called it as significant as HTML and Linux, and he built an entire section of NVIDIA's flagship keynote around it.
The audience absorbed the headline. What most coverage missed is the follow-on question that actually matters for anyone building real operations: if AI agents are going to handle business workflows at scale, what infrastructure do they actually run on?
That is not a philosophical question. It is an engineering question. And the answer determines whether your agentic strategy is something that works in production or something that works in a boardroom presentation.
At RTC League, we build and operate the real-time communication infrastructure that AI voice agents run on. Our stack powers TelEcho, our AI voice agent platform, and serves clients across Pakistan, the UAE, and South and Southeast Asia. This piece reflects what we have seen in production deployments, not in demos.
What Jensen Actually Said at GTC 2026 and Why the Infrastructure Question Follows
NVIDIA held its flagship GTC Conference on March 16, 2026, in San Jose, California. The centerpiece software announcement was NemoClaw, NVIDIA's enterprise-grade agentic AI platform built on OpenClaw, the open-source AI agent framework that became the fastest-growing open source project in the history of computing within weeks of its launch.
Jensen Huang suggested this shift will transform software itself, predicting a move from SaaS to what he called "Agentic AI as a Service." For companies with voice-based customer operations, this shift is not abstract. A persistent AI agent connects to your tools, data, and communication channels, then acts on your behalf. It has I/O. It has memory. It has scheduling. It has tool access.
The distinction between a chatbot and an actual agent matters enormously for infrastructure decisions. A chatbot generates a response. An agent receives a goal, executes a sequence of actions, and closes the loop, whether that loop involves scheduling an appointment, resolving a billing dispute, or routing a patient inquiry, without a human in the middle.
For any agent that interacts through voice, the infrastructure question is immediate. Every word a user speaks has to travel, get processed, trigger a response, and return as audio, all within a timeframe that makes the conversation feel natural. That requires a communication stack built specifically for this purpose.
The Market Context: Why This Is Happening Now, Not Later
The numbers behind the agentic shift are not projections. They are the current reality.
Metric | Stat | Source |
Global Voice AI Agents market value (2026) | $2.4B growing to $47.5B by 2034 | Market.us |
CAGR of Voice AI Agents market | 34.8% | Market.us |
Businesses planning AI voice for customer service by 2026 | 80% | Nextiva |
Production voice agent deployments growth (YoY) | 340% | AI Voice Research |
Fortune 500 companies running production voice AI | 67% | AI Voice Research |
Gartner forecast for contact center labor cost savings from conversational AI in 2026 | $80 billion | Gartner |
Cost per AI voice agent call vs. human agent call | $0.40 vs. $7-12 | Teneo.ai |
3-year ROI for companies using voice AI | 331% to 391% | Forrester/PolyAI |
Agentic AI adoption in enterprises already piloting or scaling | 62% | AssemblyAI 2026 |
Despite 79% of organizations reporting some level of AI agent adoption, 50% of agentic AI projects remain stuck in pilot stages.The gap between pilot and production is almost always an infrastructure problem, not a model problem.
Demand for inference infrastructure is expected to exceed $1 trillion by 2027, driven by the need to serve millions of users simultaneously. The computer layer is expanding. The question is whether the communication layer underneath AI agents can keep up.
WebRTC in the Age of AI Agents: Why Latency Is the Product
WebRTC is the open standard for real-time audio and video communication in browsers and applications. It has been in production since 2011 and underpins Google Meet, Zoom's browser client, telehealth platforms, and browser-based enterprise calling tools globally.
The global WebRTC market was valued at approximately $4.23 billion in 2022 and is projected to expand at a compound annual growth rate of around 30% through 2030.
For most of that history, WebRTC was a human communication protocol. Two endpoints, a person at each one. The shift happening now is that one of those endpoints is an AI agent. This changes the latency requirements completely.
The Latency Reality of Production AI Voice
Research shows human conversation operates on a 200-300ms response window, hardwired across all cultures. Exceeding this threshold triggers neurological stress responses that break conversational flow.
Users never complain about "latency." They report agents that "feel slow," "keep getting interrupted," or "don't understand when I'm done talking."
Here is what the production latency pipeline actually looks like for a cascaded AI voice agent:
Pipeline Stage | Target Latency | Notes |
Audio transmission to media server | 50-100ms | Depends on geographic proximity |
Speech-to-Text (STT) transcription | 100-200ms | Best-in-class: NVIDIA Parakeet at 72ms TTFT |
LLM inference (first token) | 200-400ms | Varies by model size and load |
Text-to-Speech synthesis | 75-200ms | ElevenLabs Turbo: 138ms TTFB |
Audio return to caller | 50-100ms | Depends on routing hops |
Total end-to-end target | Under 800ms | Sub-500ms for premium deployments |
The ideal turn-taking delay is about 200ms according to human conversational benchmarks. Infrastructure co-locating GPUs and telephony networks in global Points of Presence reduces round-trip time between speech and inference to sub-200ms, delivering faster responses and more natural conversations.
Component latencies are cumulative and sequential. Even if STT takes 200ms, LLM takes 400ms, and TTS takes 200ms individually, they add up to 800ms total. Plus network overhead, queuing delays, and turn detection can add another 200-400ms.
This is why infrastructure architecture decisions matter at the foundation level. Every additional hop between the media server and the AI processing layer compounds. At RTC League, the WebRTC stack we operate runs AI processing co-located with the media server rather than routing audio through external API endpoints. That architectural decision is what separates TelEcho's response latency from deployments built on general-purpose cloud infrastructure.
LiveKit is essentially the operating system of agentic computers in the communications context. It is the leading open-source WebRTC media server for AI agent workloads, providing session management, real-time audio processing hooks, and AI SDK integration that production deployments require. RTC League operates managed LiveKit infrastructure for organisations that need production-grade AI voice capability without the operational overhead of running and scaling media servers in-house.
SIP Trunking: Where Most Agentic Voice Deployments Actually Break
Most businesses do not route customer calls through browser-based WebRTC sessions. Customers call a phone number. That call travels over the PSTN via SIP. For an AI voice agent to answer calls on standard phone numbers, the infrastructure needs a SIP trunk that bridges traditional telephony to the WebRTC and AI processing stack cleanly.
This is one of the most common points of failure in AI voice deployments, and it receives the least attention in most architecture discussions.
SIP-to-WebRTC Failure Point | What Happens | Production Impact |
Codec mismatch at transcoding | G.711/G.729 negotiation failure | Audio artefacts, partial dropout |
External SIP provider + separate WebRTC stack | Additional network crossing between providers | 80-150ms added latency per crossing |
Jitter accumulation at the bridge | Buffer mismanagement between PSTN and WebRTC | Choppy audio, misfire on Voice Activity Detection |
Unoptimised SIP trunk configuration | Default settings not tuned for AI workloads | Elevated latency on every call |
Enterprise SIP trunking for AI agent deployments needs to handle codec negotiation cleanly, maintain low jitter at the transcoding layer, and pass audio to the WebRTC media infrastructure without unnecessary routing hops.
RTC League provides enterprise SIP trunking integrated directly with the managed LiveKit infrastructure we operate. The SIP trunk and the media server are part of the same stack, not separate vendor relationships with a network crossing between them. For businesses running legacy PBX systems, this is also the migration path that allows AI agents to be layered into existing telephony infrastructure without requiring a full system replacement.
HIPAA-Compliant AI Voice Agents: What Healthcare Organisations Need to Get Right Before Go-Live
Healthcare is one of the highest-ROI verticals for AI voice agents. The volume of routine administrative interactions is enormous, and a large proportion is predictable enough for an agent to handle: appointment scheduling, prescription refill routing, insurance verification, post-discharge follow-up, lab result notifications.
The compliance constraint is HIPAA. Here is the current state of the risk environment:
Healthcare AI + Data Breach Stat | Figure | Source |
Average healthcare data breach cost (2025) | $7.42 million | IBM / HIPAA Journal |
U.S.-specific average healthcare breach cost (2025) | $10.22 million | IBM |
Healthcare breaches involving a business associate or vendor (doubled in one year) | 30% of all incidents | Verizon 2025 DBIR |
PHI records stolen from third-party vendors vs. directly from hospitals | Over 80% | HIPAA Journal |
Healthcare organisations hit by a cyberattack in the past 12 months | 93% | Ponemon 2025 |
Shadow AI adding to breach costs | $670,000 average increase | IBM 2025 |
HIPAA penalty range per violation per year | $100 to $50,000 per category | HHS |
OCR breach notification deadline | 60 days from discovery | HIPAA |
The most important trend in healthcare breaches is not the total number. It is where breaches originate. Breaches involving a business associate or vendor doubled in one year, from 15% to 30% of all incidents. Over 80% of stolen PHI records were stolen from third-party vendors and software services, not directly from hospitals.
This has a direct implication for AI voice infrastructure. Every component in the stack that touches patient audio is a potential third-party breach point.
Here is what HIPAA compliance actually requires for an AI voice agent deployment, translated to infrastructure decisions:
End-to-end encryption throughout the audio pipeline. WebRTC handles transport-layer encryption via DTLS and SRTP natively. The gap is the processing layer. If a media server decrypts audio, ships it to an external AI API, and returns synthesised speech unencrypted or through an uncovered vendor, there is a PHI exposure gap regardless of what happens at the transport layer.
BAAs with every vendor in the pipeline. Every third-party API integrated into a healthcare AI application, including LLM evaluation providers, speech services, and infrastructure operators, must have a signed Business Associate Agreement. AWS, Azure, and Google Cloud offer HIPAA BAAs. General-purpose consumer AI APIs typically do not. Never send PHI to an API without a signed BAA.
Access controls and audit logging. New 2025 HIPAA Security Rule updates require mandatory encryption for all ePHI in storage and transit, continuous monitoring through automated systems for real-time risk assessments and audit logs, and increased penalties adjusted for inflation exceeding $100,000 per violation annually. 67% of healthcare organisations admit they are not ready for these stricter standards.
Breach notification architecture in place before go-live. HIPAA requires notification of affected individuals and HHS without unreasonable delay and no later than 60 days after the breach is discovered. The monitoring and incident detection layer needs to be operational at launch.
RTC League's infrastructure supports HIPAA-compliant deployments. The WebRTC and media server stack is configurable for encrypted media handling throughout the pipeline, and we work with healthcare clients specifically on the architecture decisions that determine compliance before deployment, not as a retrofit after go-live.
The Complete Infrastructure Stack for a Production Agentic Strategy
Here is what the infrastructure layer of a real agentic voice strategy looks like:
Infrastructure Layer | What It Does | What Breaks Without It |
WebRTC media server (LiveKit managed) | Carries audio between users and AI agents; provides session management and AI SDK integration | High latency, session instability, no AI processing hooks |
Enterprise SIP trunking | Connects PSTN phone numbers to the AI stack cleanly | Codec failures, added latency at bridge, audio artefacts |
Co-located AI processing | AI inference runs close to the media server, not via remote API calls | Compounding latency from remote API round trips |
Voice Activity Detection + streaming audio | Processes audio in real time rather than waiting for complete utterances | Slower turn detection, higher perceived latency |
Session management and observability | Monitors session health, latency metrics, audio quality | No visibility into production failures until customers complain |
Compliance architecture | Encrypted media handling, BAA-covered vendor chain, audit logs | HIPAA exposure, regulatory liability, unsellable to regulated industries |
The average implementation cost of agentic AI runs to $890,000, producing 171% average ROI in organisations that have done the foundational work, and very different outcomes in organisations that have not. The infrastructure layer is the foundational work. The model layer gets almost all the attention. The communication stack is where production deployments succeed or fail.
RTC League is built around this full stack: managed WebRTC infrastructure, enterprise SIP trunking, AI voice through TelEcho, and the operational capability to run this in production at the reliability and compliance levels that customer-facing deployments require.
Jensen Huang asked a question at GTC 2026 that is not a new question. It is the same question that has decided every major technology transition of the last thirty years. The companies that win the agentic transition are not the ones with the best model. They are the ones who get the infrastructure right before they scale agents on top of it.






