Tencent · 2026-04-29 · major
Tencent Hy-MT1.5 1.25-bit — 440MB On-Device Translator for 33 Languages Beats 72B Models
Tencent open-sourced a 1.25-bit quantized 1.8B translation model that fits in 440MB, runs on phones offline, covers 33 languages × 1,056 directions, and tops Tower-Plus-72B and Qwen3-32B on Flores-200.

A 440MB translation model that runs offline on a phone and beats 72B-parameter open models and commercial APIs on Flores-200.
Key specs
| Parameters | 1.8B |
|---|---|
| Fp16 size | 3.3GB |
| Languages | 33 |
| Translation directions | 1,056 |
| Quantization | 1.25-bit (Sherry) |
What is it?
On April 29, 2026, the Tencent Hunyuan team open-sourced Hy-MT1.5-1.8B-1.25bit on Hugging Face via the AngelSlim repo. It's a 1.8B-parameter machine translation model compressed to 440MB using a new 1.25-bit quantization scheme called Sherry (paper accepted to ACL 2026). It supports 33 languages plus 5 dialect/minority variants — 1,056 translation directions in total.
How does it work?
Sherry uses 3:4 fine-grained sparsity: for every four weights, three are stored as ternary {-1, +1} and one is zeroed, giving a 1.25-bit effective width and power-of-two SIMD alignment. Tencent pairs the weights with a custom STQ mobile-CPU kernel so phones can run it without GPU support. Training uses MT-oriented pre-training, SFT, on-policy distillation, and RL. Tencent reports that on Flores-200 Chinese↔foreign benchmarks the 1.25-bit version outperforms Tower-Plus-72B, Qwen3-32B, Microsoft Translator, and Doubao Translator.
Why does it matter?
Real on-device translation has lagged because high-quality models needed multi-GB checkpoints. A 440MB model that fits on phone RAM and matches 72B open-weights and commercial APIs collapses the trade-off between offline-friendly and good. The Android demo includes a background word-extraction mode that overlays translations on any app — emails, chat, web pages — without sending data to the network. Sherry is also a generic 1.25-bit recipe other model authors can apply.
Who is it for?
Mobile-app developers, on-device-AI researchers, anyone shipping translation in low-connectivity scenarios
Try it
Download weights from huggingface.co/tencent/Hy-MT1.5-1.8B-1.25bit