Unleashing Vertex AI Potential: Introducing the New Gemini and Gemma Models to Google Cloud Users

As you probably have read before, we at Expected X love Google Cloud's Vertex AI. This isn’t to say that Google is infallible and the best cloud platform for all use cases (look no further than the PR disaster that emerged from their misrepresentation of the Gemini model’s hands-on release video), but for the implementation of AI/ML solutions, it excels.

Vertex AI is a comprehensive platform designed to leverage models at scale, offering over 150 first-party, open, and third-party foundation models. It enables customization with enterprise-ready tuning, grounding, monitoring, and deployment capabilities, as well as the creation of AI agents via Vertex AI Agent Builder (should LangChain worry?). Renowned companies such as ADT, IHG Hotels & Resorts, ING Bank, and Verizon are utilizing Vertex AI to accelerate their innovation process in building, deploying, and maintaining AI apps and agents.

Recently, at Google I/O '24, several new models were announced, including Gemini 1.5 Flash, which is lightweight and equipped with a groundbreaking context window of 1 million tokens. This model is ideal for chat applications. PaliGemma, available in Vertex AI Model Garden, is the Gemma family's first vision-language model that excels in tasks like image captioning and visual question-answering.

Upcoming models include Imagen 3, an advanced text-to-image generation model with incredible detail and photorealistic capabilities, and Gemma 2, the next generation of open models designed for various AI developer use cases. Google is clearly focusing on multi-modal LLM development and understands the industry’s desire for this capability. What we’d like to see is a model that combines PaliGemma’s and Imagen 3’s capabilities into a single visual model.

To optimize model performance, Vertex AI introduced new features such as:

  • Context caching: Vertex AI Pipeline steps can be skipped if a previous pipeline execution was cached in Vertex AI Metadata. This can reduce compute costs and speed up inference time, especially when utilizing the full 1M token Gemini input.

  • Controlled generation: This could be huge for developers using Gemini for creating JSON, YAML, XML, and other data serialization formats. Users can clearly specify output format rather than crossing their fingers that Gemini follows their request via the prompt.

  • Batch API: This feature should be super-helpful for non-time-sensitive tasks like document classification when building, for instance, a RAG system with proprietary organizational materials.

Gemini 1.5 Flash

Let’s be reasonable though — most of the applications we see today do require us to take into account latency and response time. This is partially what gives today’s LLMs that “human-to-human” interaction quality that fools us into thinking it is “intelligent!”

Gemini 1.5 Flash is specifically designed for high-volume tasks where cost and latency are crucial. It shares the same context window as Gemini 1.5 Pro, but is more suitable for tasks like chat applications, image analysis, and data extraction from long-form documents. Gemini 1.5 Pro will soon be generally available, offering an even larger than Flash’s 2 million token context window for analyzing large code bases or extensive document libraries.

In addition, Vertex AI ensures grounded outputs by offering the “Grounded with Google Search” capability with enterprise databases and designated “sources of truth.” Google Search has already included this function for a while now when generative results are returned along with regular Google results, but now it’s offering this capability when searching enterprise data. It looks like RAG systems are going to look a lot more like a regular Google Search going forward!

The Expected X Take…

We’ve seen Google overpromise and underdeliver on countless occasions and we don’t want that to be the case with their Vertex AI offerings (or our business would be in some serious trouble)! What we’d like to see Google do more of is build tools and applications for, well, building tools and applications that utilize LLMs on the backend. It seems that nowadays, most of the tech giants are only interested in creating better LLMs and aren’t as interested in what an organization can do with them. Google does seem to be slowly moving into this area though with Google Marketplace offerings like RAG on GKE. Let’s hope they continue moving in this direction.

Ready to take your AI projects to the next level with GCP and Vertex AI? Expected X is the partner you need—contact us today!

Previous
Previous

Exploring AI Agent Frameworks: Can an Entire Business Be Run by AI Agents Alone?

Next
Next

Exploring AI Agent Frameworks: crewAI and LangChain as AI Agent Frameworks