AI/TLDR

OpenVoice

Instant voice cloning that copies a tone color and speaks in many languages

Overview

OpenVoice is an open-source voice cloning model built by researchers at MIT, Tsinghua University, and MyShell. It copies the tone color of a reference speaker from a short audio clip and then uses that voice to generate new speech.

The reference clip can be in any language, and the cloned voice can speak in several languages, including English, Spanish, French, Chinese, Japanese, and Korean in version 2. Beyond just copying a voice, OpenVoice gives you control over style details like emotion, accent, rhythm, pauses, and intonation.

Both V1 and V2 are released under the MIT License, so the project is free for commercial and research use. It has powered the instant voice cloning feature on the MyShell platform.

What it does

  • Accurate tone color cloning from a short reference audio clip
  • Zero-shot cross-lingual cloning: the reference and output languages need not match
  • Flexible style control over emotion, accent, rhythm, pauses, and intonation
  • Native multi-lingual support in V2 for English, Spanish, French, Chinese, Japanese, and Korean
  • MIT licensed and free for both commercial and research use
  • Local Gradio demo and example notebooks for trying cloning end to end

Getting started

OpenVoice targets developers and researchers comfortable with Linux, Python, and PyTorch. Create a Conda environment, install the package, then download the model checkpoints. For V2 you also install MeloTTS for multi-lingual speech.

Create the environment and install OpenVoice

Make a Python 3.9 Conda environment, clone the repository, and install the package in editable mode. This same install works for both V1 and V2.

bashbash
conda create -n openvoice python=3.9
conda activate openvoice
git clone git@github.com:myshell-ai/OpenVoice.git
cd OpenVoice
pip install -e .

Add MeloTTS for V2 multi-lingual speech

OpenVoice V2 uses MeloTTS to generate base speech across languages. Install it and download the dictionary data.

bashbash
pip install git+https://github.com/myshell-ai/MeloTTS.git
python -m unidic download

Try the local Gradio demo

After downloading the checkpoints into the checkpoints folder, launch the minimalist local Gradio demo. The example notebooks demo_part1, demo_part2, and demo_part3 show full cloning workflows.

bashbash
python -m openvoice_app --share

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Clone a narrator's voice from a short clip and generate audiobook or video narration in several languages
  • Build multilingual voice assistants or chatbots that keep a consistent voice across English, Spanish, French, Chinese, Japanese, and Korean
  • Add emotion, accent, and rhythm control to text-to-speech output for more natural-sounding dialogue
  • Localize content by speaking the same voice in a language the original speaker never recorded

How OpenVoice compares

OpenVoice alongside other open-source audio, music & voice tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Whisper★ 103kOpenAI's speech recognition model that transcribes and translates audio across many languages.
GPT-SoVITS★ 58.9kAn open-source WebUI that clones a voice from a short audio sample and turns text into speech, with zero-shot and few-shot fine-tuning.
VibeVoice★ 49.5kMicrosoft's text-to-speech model for generating long, expressive multi-speaker audio like podcasts.
Coqui TTS★ 45.6kA library of text-to-speech models including the multilingual XTTS voice-cloning model.
ChatTTS★ 39.5kChatTTS is an open-source text-to-speech model tuned for dialogue, with multi-speaker support and fine-grained control over laughter, pauses, and prosody.
MockingBird★ 36.9kAn open-source PyTorch toolbox that clones a voice from a short sample and generates Mandarin Chinese speech, with a web app, desktop toolbox, and command line.
OpenVoice★ 36.7kInstant voice cloning that copies a tone color and speaks in many languages
VoxCPM★ 31kAn open-source text-to-speech system that generates natural multilingual speech, designs voices from text descriptions, and clones any voice from a short clip.