Gemini Embedding 2

Introduction

Gemini Embedding 2 is a groundbreaking multimodal embedding model built on the Gemini architecture, now available in public preview via the Gemini API and Vertex AI. This model expands beyond text-only embeddings by natively processing and mapping text, images, videos, audio, and documents into a single, unified embedding space. It captures semantic intent across over 100 languages and supports interleaved input of multiple modalities in a single request, enabling complex multimodal understanding.

Key Features:

Multimodal Processing: Handles text (up to 8192 tokens), images (up to 6 per request), videos (up to 120 seconds), audio (native ingestion without transcription), and documents (PDFs up to 6 pages)
Unified Embedding Space: Maps diverse media types into a single space for seamless multimodal retrieval and classification
Flexible Output Dimensions: Uses Matryoshka Representation Learning (MRL) to scale dimensions from default 3072 down to 768 for cost-performance balance
State-of-the-Art Performance: Outperforms leading models in text, image, and video tasks with strong speech capabilities
Developer-Friendly: Available through Gemini API, Vertex AI, and popular frameworks like LangChain, LlamaIndex, and Vector Search

Use Cases:

Retrieval-Augmented Generation (RAG) systems
Multimodal semantic search and data clustering
Sentiment analysis across different media types
Legal discovery processes with image and video search
Creator content indexing and brand collaboration platforms
Personal wellness applications with conversational memory embedding

Gemini Embedding 2

Introduction

Analytics

Information

Categories

Tags

More Products

Marqetir

Publivy

Ohuriya AI