Overview
OpenVoice is an open-source voice cloning model built by researchers at MIT, Tsinghua University, and MyShell. It copies the tone color of a reference speaker from a short audio clip and then uses that voice to generate new speech.
The reference clip can be in any language, and the cloned voice can speak in several languages, including English, Spanish, French, Chinese, Japanese, and Korean in version 2. Beyond just copying a voice, OpenVoice gives you control over style details like emotion, accent, rhythm, pauses, and intonation.
Both V1 and V2 are released under the MIT License, so the project is free for commercial and research use. It has powered the instant voice cloning feature on the MyShell platform.
What it does
- Accurate tone color cloning from a short reference audio clip
- Zero-shot cross-lingual cloning: the reference and output languages need not match
- Flexible style control over emotion, accent, rhythm, pauses, and intonation
- Native multi-lingual support in V2 for English, Spanish, French, Chinese, Japanese, and Korean
- MIT licensed and free for both commercial and research use
- Local Gradio demo and example notebooks for trying cloning end to end
Getting started
OpenVoice targets developers and researchers comfortable with Linux, Python, and PyTorch. Create a Conda environment, install the package, then download the model checkpoints. For V2 you also install MeloTTS for multi-lingual speech.
Create the environment and install OpenVoice
Make a Python 3.9 Conda environment, clone the repository, and install the package in editable mode. This same install works for both V1 and V2.
conda create -n openvoice python=3.9
conda activate openvoice
git clone git@github.com:myshell-ai/OpenVoice.git
cd OpenVoice
pip install -e .Add MeloTTS for V2 multi-lingual speech
OpenVoice V2 uses MeloTTS to generate base speech across languages. Install it and download the dictionary data.
pip install git+https://github.com/myshell-ai/MeloTTS.git
python -m unidic downloadTry the local Gradio demo
After downloading the checkpoints into the checkpoints folder, launch the minimalist local Gradio demo. The example notebooks demo_part1, demo_part2, and demo_part3 show full cloning workflows.
python -m openvoice_app --shareCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Clone a narrator's voice from a short clip and generate audiobook or video narration in several languages
- Build multilingual voice assistants or chatbots that keep a consistent voice across English, Spanish, French, Chinese, Japanese, and Korean
- Add emotion, accent, and rhythm control to text-to-speech output for more natural-sounding dialogue
- Localize content by speaking the same voice in a language the original speaker never recorded
How OpenVoice compares
OpenVoice alongside other open-source audio, music & voice tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Whisper | ★ 103k | OpenAI's speech recognition model that transcribes and translates audio across many languages. |
| GPT-SoVITS | ★ 58.9k | An open-source WebUI that clones a voice from a short audio sample and turns text into speech, with zero-shot and few-shot fine-tuning. |
| VibeVoice | ★ 49.5k | Microsoft's text-to-speech model for generating long, expressive multi-speaker audio like podcasts. |
| Coqui TTS | ★ 45.6k | A library of text-to-speech models including the multilingual XTTS voice-cloning model. |
| ChatTTS | ★ 39.5k | ChatTTS is an open-source text-to-speech model tuned for dialogue, with multi-speaker support and fine-grained control over laughter, pauses, and prosody. |
| MockingBird | ★ 36.9k | An open-source PyTorch toolbox that clones a voice from a short sample and generates Mandarin Chinese speech, with a web app, desktop toolbox, and command line. |
| OpenVoice | ★ 36.7k | Instant voice cloning that copies a tone color and speaks in many languages |
| VoxCPM | ★ 31k | An open-source text-to-speech system that generates natural multilingual speech, designs voices from text descriptions, and clones any voice from a short clip. |