Model family
BGE Models
BGE models cover embeddings, reranking, retrieval, semantic search, and vector database workflows for RAG builders.
Best for
Embedding
Use this family hub to compare BGE variants for embedding workflows, then open the detail page for deeper deployment notes.
Reranking
Use this family hub to compare BGE variants for reranking workflows, then open the detail page for deeper deployment notes.
RAG
Use this family hub to compare BGE variants for rag workflows, then open the detail page for deeper deployment notes.
Search
Use this family hub to compare BGE variants for search workflows, then open the detail page for deeper deployment notes.
Variants
BGE models grouped by workflow
Embedding and reranking
BGE Reranker v2 M3
BAAI · BGE
Best for: RAG builders who need a practical reranker after Qdrant, Chroma, pgvector, or other vector search.
Local: Commonly used in local RAG stacks as a reranking step after vector search.
bge-m3
BAAI · BGE
Best for: Multilingual embedding and retrieval workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-large-en-v1.5
BAAI · BGE
Best for: English embedding retrieval workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-base-en-v1.5
BAAI · BGE
Best for: Balanced English embedding workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-small-en-v1.5
BAAI · BGE
Best for: Lightweight local embedding workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-large-zh-v1.5
BAAI · BGE
Best for: Chinese embedding and retrieval workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-reranker-large
BAAI · BGE
Best for: RAG reranking workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-reranker-base
BAAI · BGE
Best for: Lightweight reranking workflows
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
bge-embedding-gemma2
BAAI · BGE
Best for: Gemma-based embedding experiments
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Compare
All BGE models in the directory
| Model | Type | Best for | Local runner notes | License | Detail |
|---|---|---|---|---|---|
| BGE Reranker v2 M3 | Reranking | RAG builders who need a practical reranker after Qdrant, Chroma, pgvector, or other vector search. | Commonly used in local RAG stacks as a reranking step after vector search. | Check model card | Open |
| bge-m3 | Embedding | Multilingual embedding and retrieval workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-large-en-v1.5 | Embedding | English embedding retrieval workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-base-en-v1.5 | Embedding | Balanced English embedding workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-small-en-v1.5 | Embedding | Lightweight local embedding workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-large-zh-v1.5 | Embedding | Chinese embedding and retrieval workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-reranker-large | Reranking | RAG reranking workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-reranker-base | Reranking | Lightweight reranking workflows | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| bge-embedding-gemma2 | Embedding | Gemma-based embedding experiments | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |