LogoDomain Rank App
icon of Gemini Embedding 2

Gemini Embedding 2

Google's first natively multimodal embedding model that maps text, images, video, audio, and documents into a unified embedding space.

Introduction

Gemini Embedding 2 is a groundbreaking multimodal embedding model built on the Gemini architecture, now available in public preview via the Gemini API and Vertex AI. This model expands beyond text-only embeddings by natively processing and mapping text, images, videos, audio, and documents into a single, unified embedding space. It captures semantic intent across over 100 languages and supports interleaved input of multiple modalities in a single request, enabling complex multimodal understanding.

Key Features:

  • Multimodal Processing: Handles text (up to 8192 tokens), images (up to 6 per request), videos (up to 120 seconds), audio (native ingestion without transcription), and documents (PDFs up to 6 pages)
  • Unified Embedding Space: Maps diverse media types into a single space for seamless multimodal retrieval and classification
  • Flexible Output Dimensions: Uses Matryoshka Representation Learning (MRL) to scale dimensions from default 3072 down to 768 for cost-performance balance
  • State-of-the-Art Performance: Outperforms leading models in text, image, and video tasks with strong speech capabilities
  • Developer-Friendly: Available through Gemini API, Vertex AI, and popular frameworks like LangChain, LlamaIndex, and Vector Search

Use Cases:

  • Retrieval-Augmented Generation (RAG) systems
  • Multimodal semantic search and data clustering
  • Sentiment analysis across different media types
  • Legal discovery processes with image and video search
  • Creator content indexing and brand collaboration platforms
  • Personal wellness applications with conversational memory embedding

Analytics