Google I/O 2024: New Gemini/Gemma Models for Google Cloud's Vertex AI -- Virtualization Review

Google I/O 2024: New Gemini/Gemma Models for Google Cloud's Vertex AI

By David Ramel
05/14/2024

As expected, the kickoff of Google I/O 2024 was heavy on the AI, with new updates for the company's cloud-based Vertex AI service leading a slew of day 1 announcements centered around generative AI.

"Google is fully in our Gemini era," said CEO Sundar Pichai in the opening keynote.

He noted how the company combined its Google Brain and DeepMind teams to advance AI.

"Using the computational resources of Google, they're focused on building more capable systems safely and responsibly," he said. "This includes our next-generation foundation model, Gemini, which is still in training. Gemini was created from the ground up to be a multimodal, highly efficient tool with API integrations and built to enable future innovations like memory and planning. While still early, we're already seeing impressive multimodal capabilities not seen in prior models. Once fine-tuned and rigorously tested for safety, Gemini will be available at various sizes and capabilities, just like PaLM 2."

The Cloud
Vertex AI is a fully managed, unified development platform for leveraging models at scale, providing a selection of more than 150 first-party, open, and third-party foundation models. It helps developers build AI agents and can be used to customize models with enterprise-ready tuning, grounding, monitoring and deployment capabilities.

New models available now include Gemini 1.5 Flash and PaliGemma. Gemini 1.5 Flash is a lighter-weight alternative to the Gemini 1.5 Pro model, designed for high-volume tasks like chat applications. PaliGemma is the first vision-language model in the Gemma family of open models, optimized for tasks such as image captioning and visual question-answering. It's available in Vertex AI Model Garden.

Available later will be Imagen 3, a text-to-image generation model that can generate detailed, photorealistic images, along with Gemma 2, a new open model that is built for a broad range of AI developer use cases, based on Gemini tech.

Finally, Gemini 1.5 Pro will be available to those accepted from a waitlist, boasting an expanded 2 million context window.

The company also unveiled three new capabilities for Vertex AI, joining recently announced prompt management and model evaluation tools:

Context caching: Entering public preview next month, this helps users actively manage and reuse cached context data. "As processing costs increase by context length, it can be expensive to move long-context applications to production," Google said. "Vertex AI context caching helps customers significantly reduce costs by leveraging cached data."
Controlled generation: This will enter public preview sooner than context caching, coming later this month. It helps users define Gemini model outputs according to specific formats or schemas. "Most models cannot guarantee the format and syntax of their outputs, even with specified instructions," Google said. "Vertex AI controlled generation lets customers choose the desired output format via pre-built options like YAML and XML, or by defining custom formats. JSON, as a pre-built option, is live today."
Batch API, now available in public preview, is described as "a super-efficient way to send large numbers of non-latency sensitive text prompt requests, supporting use cases such as classification and sentiment analysis, data extraction, and description generation." Benefits are said to include speeding up developer workflows and reducing costs by enabling multiple prompts to be sent to models in one request.

Google also announced: