Overview
Qwen3-Embedding is a series of open text embedding and reranking models from Alibaba's Qwen team, built on the Qwen3 base models. It comes in three sizes (0.6B, 4B, and 8B) for both embedding and reranking, so you can trade off accuracy against speed and memory.
The embedding models turn text into vectors you can use for search, clustering, classification, and retrieval-augmented generation. The reranking models take a query and a list of candidate passages and score how relevant each one is, which helps you reorder search results before sending them to an LLM. The two model types are meant to be used together.
It supports over 100 languages plus code, handles sequences up to 32K tokens, and lets you customize the task instruction per query. Embedding models also support Matryoshka Representation Learning (MRL), so you can shorten the output vector to fit your storage and latency budget.
What it does
- Embedding and reranking models in 0.6B, 4B, and 8B sizes for both tasks
- Support for over 100 languages, including cross-lingual and code retrieval
- 32K token sequence length for embedding long documents
- Instruction-aware: each query can include a task description to improve results
- MRL support lets you truncate embeddings to a custom dimension
- Loads through Hugging Face Transformers or sentence-transformers
Getting started
The easiest way to start is the 0.6B embedding model through sentence-transformers. You need transformers 4.51.0 or newer.
Install the libraries
Install sentence-transformers, which pulls in a compatible version of transformers and torch.
pip install "transformers>=4.51.0" "sentence-transformers>=2.7.0"Embed queries and documents
Load the model, encode your queries with the built-in "query" prompt, encode your documents, and compute cosine similarity.
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")
# The queries and documents to embed
queries = [
"What is the capital of China?",
"Explain gravity",
]
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Encode the queries and documents. Queries benefit from the "query" prompt.
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Compute the (cosine) similarity between query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)Use the right instruction per task
For embedding models, writing a short, task-specific instruction in English for your queries typically improves results by 1-5%. Documents do not need an instruction.
Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Build the retrieval step of a RAG pipeline that searches your documents before calling an LLM
- Power semantic or cross-lingual search across content in 100+ languages
- Rerank candidate passages with the Qwen3-Reranker models to improve search relevance
- Cluster or classify large text collections using the generated vectors
How Qwen3-Embedding compares
Qwen3-Embedding alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Sentence Transformers | ★ 18.8k | The standard Python framework for loading, training, and computing embeddings with sentence and reranking models. |
| EmbeddingGemma (Gemma) | ★ 5.5k | Google DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search. |
| Text Embeddings Inference (TEI) | ★ 4.9k | Hugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU. |
| Infinity (Embeddings) | ★ 2.8k | A high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API. |
| ColPali | ★ 2.7k | A vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first. |
| Model2Vec | ★ 2.1k | A tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference. |
| Instructor Embedding | ★ 2k | Instruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction. |
| Qwen3-Embedding | ★ 2k | Open multilingual text embedding and reranking models from the Qwen3 family |