Tencent · 2026-04-29 · major

Tencent Hy-MT1.5 1.25-bit — 440MB On-Device Translator for 33 Languages Beats 72B Models

Item: Tencent Hy-MT1.5 1.25-bit — 440MB On-Device Translator for 33 Languages Beats 72B Models
Rating: 4
Author: AI/TLDR

Tencent open-sourced a 1.25-bit quantized 1.8B translation model that fits in 440MB, runs on phones offline, covers 33 languages × 1,056 directions, and tops Tower-Plus-72B and Qwen3-32B on Flores-200.

Hy-MT1.5-1.8B-1.25bit Hugging Face model card thumbnail. — Hugging Face / Tencent

A 440MB translation model that runs offline on a phone and beats 72B-parameter open models and commercial APIs on Flores-200.

Key specs

Parameters	1.8B
Fp16 size	3.3GB
Languages	33
Translation directions	1,056
Quantization	1.25-bit (Sherry)

What is it?

On April 29, 2026, the Tencent Hunyuan team open-sourced Hy-MT1.5-1.8B-1.25bit on Hugging Face via the AngelSlim repo. It's a 1.8B-parameter machine translation model compressed to 440MB using a new 1.25-bit quantization scheme called Sherry (paper accepted to ACL 2026). It supports 33 languages plus 5 dialect/minority variants — 1,056 translation directions in total.

How does it work?

Sherry uses 3:4 fine-grained sparsity: for every four weights, three are stored as ternary {-1, +1} and one is zeroed, giving a 1.25-bit effective width and power-of-two SIMD alignment. Tencent pairs the weights with a custom STQ mobile-CPU kernel so phones can run it without GPU support. Training uses MT-oriented pre-training, SFT, on-policy distillation, and RL. Tencent reports that on Flores-200 Chinese↔foreign benchmarks the 1.25-bit version outperforms Tower-Plus-72B, Qwen3-32B, Microsoft Translator, and Doubao Translator.

Why does it matter?

Real on-device translation has lagged because high-quality models needed multi-GB checkpoints. A 440MB model that fits on phone RAM and matches 72B open-weights and commercial APIs collapses the trade-off between offline-friendly and good. The Android demo includes a background word-extraction mode that overlays translations on any app — emails, chat, web pages — without sending data to the network. Sherry is also a generic 1.25-bit recipe other model authors can apply.

Who is it for?

Mobile-app developers, on-device-AI researchers, anyone shipping translation in low-connectivity scenarios

Try it

Download weights from huggingface.co/tencent/Hy-MT1.5-1.8B-1.25bit