Google Expands Gemini API File Search With Multimodal Retrieval, Custom Metadata Filters, and Page-Level Citations

RELATED TOPICS: AI Search Martech
Google Expands Gemini API File Search With Multimodal Retrieval, Custom Metadata Filters, and Page-Level Citations

Developers building AI-powered applications on Google's Gemini API now have access to three new capabilities for retrieving information from private data stores. Google confirmed in its May 5, 2026, Google Product Blog post that the Gemini API's File Search tool has been updated to support multimodal retrieval, custom metadata filtering, and page-level citations.

What Changed in the May 5 Update

Google's May 5 blog post states that the Gemini API's File Search tool added three features: native multimodal retrieval that processes images and text together, custom metadata filtering, and page-level citations tied to the original source.

The multimodal retrieval capability is powered by Gemini Embedding 2 and allows images, including charts, product photos, and diagrams, to be natively indexed and searched in the same store as text-based documents. With the Gemini Embedding 2 model, images are embedded directly rather than relying on OCR, enabling true visual retrieval.

Custom metadata filtering allows developers to tag files with labels such as department, status, or file type, enabling agents to narrow their search to specific data slices and significantly reducing irrelevant results.

Page-level citations tie the model's response directly to the original source, capturing the page number for every piece of indexed information. This allows applications to direct users to the precise location in a document, supporting fact-checking and source verification.

The Gemini Embedding 2 Model Powering Multimodal Search

The multimodal retrieval update depends on a model Google made generally available in March 2026. Gemini Embedding 2 is the first embedding model in the Gemini API to map text, images, video, audio, and documents into a single embedding space, supporting over 100 languages. The model handles an expansive range of inputs in a single call: up to 8,192 text tokens, 6 images, 120 seconds of video, 180 seconds of audio, and 6 pages of PDFs.

File Search supports both text embeddings via the gemini-embedding-001 model and image and multimodal embeddings via gemini-embedding-2. Audio and video formats are not currently supported within File Search stores.

For image uploads specifically, supported formats are PNG and JPEG. Image files must be at most 4K x 4K pixels, and a maximum of 6 images can be included per request.

Citations Extended to Image References

The page-level citation feature applies across both text and image content. Every response includes grounding metadata that links the answer to specific documents and pages. For multimodal stores, citations also include downloadable image references. Citation information is accessible through the grounding_metadata attribute of the response object.

File Search as a Managed RAG Service

The File Search tool was originally introduced in November 2025. Google's earlier announcement on November 6, 2025, introduced File Search as a fully managed retrieval-augmented generation service, with storage and embedding generation at query time free, and initial indexing charged at $0.15 per 1 million tokens using gemini-embedding-001.

When files are uploaded, the API handles chunking, embedding, indexing, and retrieval. At query time, the file_search tool is passed alongside a prompt, and the model automatically retrieves relevant chunks from the indexed data to generate a grounded response.

File Search is available with Gemini 3.1 Pro Preview, Gemini 3.1 Flash-Lite Preview, Gemini 3 Flash Preview, Gemini 2.5 Pro, and Gemini 2.5 Flash-Lite.

Practical Implications for Enterprise Marketers and B2B Teams

For enterprise teams evaluating custom AI integrations, the combination of multimodal retrieval and metadata filtering makes it more practical to build internal tools that can query large, mixed-format content libraries, product catalogues, brand asset archives, legal and compliance documents, without routing that data through public AI systems. Page-level citations address a common objection in professional settings where answers must be traceable to a named source and page. Teams adopting File Search should note that multimodal stores require the gemini-embedding-2 model, which uses a different embedding space than the original gemini-embedding-001; existing text-only indexes would need to be re-embedded to take advantage of image retrieval.

Harvey, a legal research platform for law firms and enterprises, reported a 3% increase in Recall@20 precision on legal-specific benchmarks after adopting Gemini Embedding 2, resulting in more accurate citations and answers.

It's a competitive market. Contact us to learn how you can stand out from the crowd.

The comments are closed.

Ready To Rule The First Page of Google?

Contact us for an exclusive 20-minute assessment & strategy discussion. Fill out the form, and we will get back to you right away!

What Our Clients Have To Say

L
Luciano Zeppieri
S
Sharon Tierney
S
Sheena Owen
A
Andrea Bodi - Lab Works
D
Dr. Philip Solomon MD
Newsletter
Subscribe to Our Newsletter
Newsletter
Subscribe to Our Newsletter