NVIDIA, AWS and Google Cloud Spotlight AI Infrastructure Push at GTC 2026 -- Virtualization Review

NVIDIA, AWS and Google Cloud Spotlight AI Infrastructure Push at GTC 2026

By David Ramel
03/20/2026

NVIDIA and its cloud partners used NVIDIA GTC 2026 to outline a broader push into cloud infrastructure for AI training, inference, analytics, and managed services.

Announcements from cloud partners including AWS and Google Cloud showed how new NVIDIA GPU, interconnect, and inference integrations are being packaged into large-scale cloud offerings, while NVIDIA's own cloud-related announcements provided additional context for a wider effort to build what it increasingly describes as AI factories and full-stack AI clouds.

AWS Expands NVIDIA-Based Infrastructure
AWS used NVIDIA GTC 2026 to announce a new set of NVIDIA-related cloud initiatives centered on scale, instance options, inference networking, analytics, and managed AI services. In its official GTC 2026 post, AWS said it plans to deploy more than 1 million NVIDIA GPUs across AWS Regions starting in 2026, including Blackwell and Rubin architectures. AWS also announced support for Amazon Elastic Compute Cloud (Amazon EC2) instances using NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs, described in the post as the first such announcement among major cloud providers.

Amazon also highlighted infrastructure changes aimed at inference performance. AWS said it is adding support for NVIDIA Inference Xfer Library with AWS Elastic Fabric Adapter to accelerate disaggregated large language model inference across NVIDIA GPUs and AWS Trainium systems. Integration is intended to improve inter-token latency and key-value cache movement for large inference clusters.

Beyond model serving, AWS pointed to data processing workloads. The company said AWS and NVIDIA are delivering 3x faster Apache Spark performance using Amazon EMR on Amazon EKS with Amazon EC2 G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. AWS also said support for NVIDIA Nemotron models is expanding in Amazon Bedrock, including planned reinforcement fine-tuning support and the upcoming availability of Nemotron 3 Super through the managed service.

Google Cloud Highlights Flexible GPU Consumption and Stack Integration
Google Cloud's GTC 2026 announcement focused on how NVIDIA technology is being integrated across virtual machines, Kubernetes-based inference, training infrastructure, and future rack-scale systems. Google Cloud said its G4 virtual machines, powered by NVIDIA RTX Pro 6000 Blackwell Server Edition GPUs, are seeing strong momentum, and it used GTC to preview fractional G4 VMs using NVIDIA virtual GPU technology.

The fractional G4 VM announcement was one of the more cloud-specific items from the event because it addressed infrastructure consumption rather than raw hardware availability. Google Cloud said the new configurations will let customers use smaller GPU slices, including 1/2, 1/4, and 1/8 GPU options, to better match infrastructure to workloads such as inference, rendering, remote desktops, and streaming.

Google Cloud also detailed software integration work with NVIDIA. The company said NVIDIA Dynamo is being integrated with Google Kubernetes Engine Inference Gateway to provide what it described as a modular, open-source control plane across the application layer and hardware. Google Cloud plans to be among the first cloud providers to offer NVIDIA Vera Rubin NVL72 rack-scale systems in the second half of 2026 as part of its AI Hypercomputer architecture.

Taken together, the AWS and Google Cloud announcements suggest that NVIDIA's cloud narrative at GTC 2026 went beyond a simple increase in GPU supply. The event messaging from partners emphasized a broader stack that includes networking, orchestration, inference software, analytics acceleration, and more flexible infrastructure packaging.

That positioning lines up with NVIDIA's own recent cloud messaging. While it predates GTC 2026, NVIDIA's DGX Cloud Lepton announcement from May 2025 laid out a marketplace-style approach for connecting developers with GPU capacity from a network of cloud providers. NVIDIA said the platform would connect developers with tens of thousands of GPUs from partners including CoreWeave, Crusoe, Lambda, Nebius, Nscale, SoftBank Corp., and others, with support for region-specific capacity, multi-cloud and hybrid deployment, and sovereignty-related requirements.

That earlier announcement provides context for the 2026 GTC cloud news because it showed NVIDIA pushing toward a cloud ecosystem model in which capacity, software, and deployment portability are treated as a combined offering rather than as standalone hardware sales.

Nebius Partnership
Just ahead of GTC 2026, NVIDIA added another cloud-focused announcement with its Nebius partnership release. In that announcement, NVIDIA and Nebius said they would work together to develop and deploy what they called the next generation of hyperscale cloud for the AI market. NVIDIA also said it would invest $2 billion in Nebius and support the company's early adoption of NVIDIA accelerated computing platforms.

The Nebius announcement broadened the cloud story beyond hyperscalers. NVIDIA said the partnership spans AI factory design, inference, AI infrastructure deployment, and fleet management, and that it is intended to help Nebius deploy more than 5 gigawatts of NVIDIA systems by the end of 2030. The release aligned with themes also visible during the event: larger-scale cloud buildouts, closer NVIDIA-provider engineering ties, and cloud platforms designed specifically for AI workloads rather than adapted from general-purpose infrastructure.

For cloud computing observers, the main takeaway from GTC 2026 was that NVIDIA and its partners are increasingly presenting AI infrastructure as a cloud stack problem, not just a silicon problem. AWS focused on scale, new EC2 options, inference interconnects, and analytics acceleration. Google Cloud focused on fractional GPU access, Kubernetes-based inference integration, and future rack-scale NVIDIA systems. NVIDIA's own announcements around DGX Cloud Lepton and Nebius added context to a wider strategy built around AI cloud platforms, AI factories, and globally distributed GPU capacity.

The result was a set of GTC-related announcements that put cloud architecture, infrastructure packaging, and deployment models at the center of NVIDIA's latest enterprise push.

About the Author

David Ramel is an editor and writer at Converge 360.