In-Depth
KubeCon 2025 Day 0 Recap
During the second week of November, I attended KubeCon and CloudNativeCon 2025, held in Atlanta at the Georgia World Congress Center. In this and in subsequent posts, I will detail my experiences at the event.
[Click on image for larger view.] (source: Virtualization Review).
On Monday, the day before KubeCon started, the Cloud Native Computing Foundation (CNCF) hosted a day of Co-Events. I like this format as it allows attendees of the conference to delve deeper into a specific technology or topic. These mini “Cons” range from BackstageCon to Open Source SecurityCon. The two that I was fortunate enough to attend were the Red Hat OpenShift Commons Gathering and Google Container Day.
[Click on image for larger view.] (source: Virtualization Review).
After an unseasonably chilly fifteen-minute walk to the Red Hat OpenShift Commons Gathering, I joined the event.
[Click on image for larger view.] (source: Virtualization Review).
By way of reference, the stated goal of these “gatherings” is to bring together Red Hat's users, partners, customers, contributors, and project leads to share information and insights.
The event kicked off with a keynote by Amy Marrich (Red Hat) and was followed by a presentation by Stu Miniman (Red Hat) and Michael Foster (Red Hat). Their keynote explored the four critical pillars that enable the transformation that we are seeing in IT: modernizing legacy systems, harnessing new technology, empowering developers, and navigating global regulations. The event was built on these themes.
[Click on image for larger view.] (source: Virtualization Review).
Stu and Michael started their presentation by explaining that Red Hat acquired Neural Magic to help its customers significantly increase the performance and utilization of their existing hardware accelerators (such as GPUs) and to help them integrate with tools like virtual Large Language Models (vLLM), which allows them to optimize AI model performance, which helps manage costs.
Building on this AI theme, they then discussed how they are integrating AI assistance directly into Red Hat products through its "Lightspeed" initiative. These are AI assistants that are integrated into the OpenShift console. Users can ask it questions in natural language to troubleshoot issues and use it to analyze cluster logs. They also discussed how they are integrating an AI assistant into the Red Hat Developer Hub.
As someone who has spent far too many hours combing through documentation and knowledge bases, I can appreciate how AI assistants can assist with troubleshooting, answer questions, and even automatically generate scripts.
The rest of the day at the gathering was spent on presentations by Red Hat customers describing how they utilize Red Hat, as well as a couple of hands-on sessions later on in the afternoon.
I left the event early, as I had the opportunity to attend Google Container Day, which was held on the 31st floor of Google's Atlanta office, a 15-minute ride from the Red Hat event. The view of Atlanta from the 31st floor of the Google office was incredible.
[Click on image for larger view.] (source: Virtualization Review).
The purpose of this was to share the latest information on containers in the Google Cloud.
[Click on image for larger view.] (source: Virtualization Review).
They began the day by highlighting that over the past decade, their work on and development of containers has produced a mature and feature-rich platform. Below is a summary of some of Google's key accomplishments over that time.
| Year |
Key Feature/Milestone |
| 2015 |
Initial release as Google Container Engine. |
| 2016 |
Automatic node upgrades introduced. |
| 2017 |
Container Optimized OS becomes standard. |
| 2018 |
Support for GPUs and TPUs for AI workloads. Node Auto-Provisioning launched. |
| 2019 |
Release channels (Rapid, Stable) were introduced for managing upgrade velocity. |
| 2020 |
GKE scaled to 15,000 nodes per cluster, a record held for four years. |
| 2021 |
GKE Autopilot launched, abstracting away node management. |
| 2022 |
Confidential nodes and Blue-Green upgrades added. |
| 2023 |
Dynamic Workload Scheduling and Custom Compute Classes (similar to Karpenter) are now available. |
| 2024 |
GKE supports clusters of up to 130,000 nodes. |
[Click on image for larger view.] (source: Virtualization Review).
During the presentations, Google employees and customers discussed how the company is working on the next wave of innovation, which aims to make Kubernetes more dynamic and easier to use.
They discussed the ways that they are becoming faster at compute provisioning, highlighting some of the new features they are working on, including Autopilot compute startup, which is now 7 times faster and capable of spinning up 1-10 replicas in seconds. They also mention that by using Accelerators, the startup time for GPU nodes is now up to 2x faster, reducing costly and expensive wait times.
Other features they are working on, or have in place, include In-Place Pod Resizing. This feature leverages a new Kubernetes capability (GA in 1.34) to resize a pod's CPU and memory requests/limits without requiring a disruptive pod restart. This is critical for right-sizing workloads, such as Java applications, that have high startup requirements but lower steady-state needs.
Pod Buffers API, a native API for managing standby nodes to enable faster autoscaling. Users can configure a buffer of "hot" (running) or "suspended" nodes, providing a trade-off between cost and performance without needing to use low-priority "placeholder" pods.
Their Operational Enhancements include an enhanced management portal, which now has a new "single pane of glass" GUI for managing upgrades across an entire fleet, providing data on timing, versions, and scheduling. Rollout sequencing that allows for sophisticated, tag-based, multi-stage upgrade strategies. Control Plane Rollback, a new feature that enables operators to "undo" a control plane upgrade if it fails, represents a significant improvement in operational safety. And finally, they discussed Dynamic IPAM, an automated IP address management feature that expands and shrinks pod and service CIDR ranges on demand, eliminating the need for manual network planning.
Most importantly, they stressed that GKE continues to contribute key functionalities back to the open-source community.
They then discussed key upstream projects, including Kueue, a job scheduler with primitives necessary for batch, AI, and HPC workloads. Dynamic Resource Allocation (DRA), a new API for requesting and sharing resources, such as GPUs and other accelerators. gVisor, an open-source sandboxing technology used in GKE for pod snapshotting (enabling near-instant startup for initialized inference workloads) and creating secure Agent Sandboxes for running AI agents.
The event concluded with a reception at the House of Kube, where participants mingled with the Googlers behind GKE and Cloud Run, as well as the folks at Tailscale and PlatCo.
I really appreciate that the CNCF has these events, as spending a day focused on a specific technology allowed me to delve deeper into one particular technology. I'm not saying that the format of the rest of the conference, where you spend less than an hour discussing something, isn't extremely beneficial; it's just a nice change to be able to focus on one or two subjects during a day.
With the first day under my belt, I was ready and excited for the official start of the KubeCon!
About the Author
Tom Fenton has a wealth of hands-on IT experience gained over the past 30 years in a variety of technologies, with the past 20 years focusing on virtualization and storage. He previously worked as a Technical Marketing Manager for ControlUp. He also previously worked at VMware in Staff and Senior level positions. He has also worked as a Senior Validation Engineer with The Taneja Group, where he headed the Validation Service Lab and was instrumental in starting up its vSphere Virtual Volumes practice. He's on X @vDoppler.