Overview
Piper is an open-source neural text-to-speech engine that turns written text into natural-sounding speech. It is built to be fast and to run fully on your own machine, so you do not need to send text to a cloud service.
Piper uses the espeak-ng project to convert words into phonemes, then a neural voice model to produce the audio. It ships ready-made voices in many languages, works well even on small hardware like a Raspberry Pi, and is used by projects such as Home Assistant and screen readers for the visually impaired.
What it does
- Local, offline speech synthesis with no cloud dependency, so your text stays on your own device
- Fast neural voices that run on modest hardware, including small boards like the Raspberry Pi
- Pre-trained voices in many languages that you can list and download on demand
- Command-line interface, HTTP web server, Python API, and a C/C++ library for different integration needs
- Optional GPU acceleration through onnxruntime-gpu for higher throughput
- Tunable output, including volume, speaking speed, audio variation, and raw espeak-ng phoneme injection
Getting started
Install Piper from PyPI, download a voice, then generate a WAV file from text. The Python package name is piper-tts.
Install Piper
Install the Piper package from PyPI with pip.
pip install piper-ttsDownload a voice
List the available voices, then download one. This example downloads an English (US) voice into the current directory.
python3 -m piper.download_voices en_US-lessac-mediumGenerate speech from the command line
Run Piper with a voice model and write the spoken text to a WAV file. Here it creates test.wav from a short sentence.
python3 -m piper -m en_US-lessac-medium -f test.wav -- 'This is a test.'Use the Python API
You can also call Piper from Python with PiperVoice.synthesize_wav to write audio to a WAV file.
import wave
from piper import PiperVoice
voice = PiperVoice.load("/path/to/en_US-lessac-medium.onnx")
with wave.open("test.wav", "wb") as wav_file:
voice.synthesize_wav("Welcome to the world of speech synthesis!", wav_file)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Adding offline voice output to home automation setups such as Home Assistant
- Powering screen readers and accessibility tools that read text aloud for visually impaired users
- Generating narration or voiceovers for videos and apps without relying on a paid cloud service
- Running a local text-to-speech web server for repeated synthesis from your own software
How Piper compares
Piper alongside other open-source audio, music & voice tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Whisper | ★ 104k | OpenAI's speech recognition model that transcribes and translates audio across many languages. |
| GPT-SoVITS | ★ 59k | An open-source WebUI that clones a voice from a short audio sample and turns text into speech, with zero-shot and few-shot fine-tuning. |
| VibeVoice | ★ 49.6k | Microsoft's text-to-speech model for generating long, expressive multi-speaker audio like podcasts. |
| Coqui TTS | ★ 45.6k | A library of text-to-speech models including the multilingual XTTS voice-cloning model. |
| ChatTTS | ★ 39.5k | ChatTTS is an open-source text-to-speech model tuned for dialogue, with multi-speaker support and fine-grained control over laughter, pauses, and prosody. |
| MockingBird | ★ 36.9k | An open-source PyTorch toolbox that clones a voice from a short sample and generates Mandarin Chinese speech, with a web app, desktop toolbox, and command line. |
| OpenVoice | ★ 36.8k | OpenVoice clones a voice from a short reference clip and speaks in multiple languages, with control over emotion, accent, rhythm, and intonation. |
| Piper | ★ 11.1k | Fast, local neural text-to-speech that runs offline on your own machine |