AI/TLDR

Qwen3-Embedding

Open multilingual text embedding and reranking models from the Qwen3 family

Overview

Qwen3-Embedding is a series of open text embedding and reranking models from Alibaba's Qwen team, built on the Qwen3 base models. It comes in three sizes (0.6B, 4B, and 8B) for both embedding and reranking, so you can trade off accuracy against speed and memory.

The embedding models turn text into vectors you can use for search, clustering, classification, and retrieval-augmented generation. The reranking models take a query and a list of candidate passages and score how relevant each one is, which helps you reorder search results before sending them to an LLM. The two model types are meant to be used together.

It supports over 100 languages plus code, handles sequences up to 32K tokens, and lets you customize the task instruction per query. Embedding models also support Matryoshka Representation Learning (MRL), so you can shorten the output vector to fit your storage and latency budget.

What it does

  • Embedding and reranking models in 0.6B, 4B, and 8B sizes for both tasks
  • Support for over 100 languages, including cross-lingual and code retrieval
  • 32K token sequence length for embedding long documents
  • Instruction-aware: each query can include a task description to improve results
  • MRL support lets you truncate embeddings to a custom dimension
  • Loads through Hugging Face Transformers or sentence-transformers

Getting started

The easiest way to start is the 0.6B embedding model through sentence-transformers. You need transformers 4.51.0 or newer.

Install the libraries

Install sentence-transformers, which pulls in a compatible version of transformers and torch.

bashbash
pip install "transformers>=4.51.0" "sentence-transformers>=2.7.0"

Embed queries and documents

Load the model, encode your queries with the built-in "query" prompt, encode your documents, and compute cosine similarity.

pythonpython
from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Queries benefit from the "query" prompt.
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

Use the right instruction per task

For embedding models, writing a short, task-specific instruction in English for your queries typically improves results by 1-5%. Documents do not need an instruction.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Build the retrieval step of a RAG pipeline that searches your documents before calling an LLM
  • Power semantic or cross-lingual search across content in 100+ languages
  • Rerank candidate passages with the Qwen3-Reranker models to improve search relevance
  • Cluster or classify large text collections using the generated vectors

How Qwen3-Embedding compares

Qwen3-Embedding alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Sentence Transformers★ 18.8kThe standard Python framework for loading, training, and computing embeddings with sentence and reranking models.
EmbeddingGemma (Gemma)★ 5.5kGoogle DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search.
Text Embeddings Inference (TEI)★ 4.9kHugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU.
Infinity (Embeddings)★ 2.8kA high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API.
ColPali★ 2.7kA vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first.
Model2Vec★ 2.1kA tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference.
Instructor Embedding★ 2kInstruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction.
Qwen3-Embedding★ 2kOpen multilingual text embedding and reranking models from the Qwen3 family