ChatTTS

Generative text-to-speech model built for natural daily dialogue

github.com/2noise/ChatTTS★ 39.5k 2noise.com

Overview

ChatTTS is a generative text-to-speech model designed for dialogue scenarios such as LLM assistants. Instead of reading text in a flat, robotic way, it aims to produce speech that sounds natural and expressive in a back-and-forth conversation.

The model supports English and Chinese, and can render multiple speakers so interactive conversations feel more lifelike. The open-source release on Hugging Face is a 40,000-hour pre-trained base model, intended for academic and research use under a non-commercial license.

What it does

Conversational TTS optimized for dialogue, with support for multiple speakers in interactive conversations
Fine-grained control over prosody, including laughter, pauses, and interjections via in-text tokens
Better prosody than most open-source TTS models, with pretrained weights provided for further research
Sentence-level and word-level control using tokens like [oral_2], [laugh], [uv_break], and [break_6]
Streaming audio generation and zero-shot inference with the open-sourced DVAE encoder
Sample random speaker embeddings from a Gaussian and save them to recover the same timbre later

Getting started

ChatTTS is distributed as a Python package on PyPI. Install it, load the model, and call infer() with your text to produce audio. The example below mirrors the basic usage in the project README.

Install from PyPI

Install the stable version of the ChatTTS package with pip.

bashbash

pip install ChatTTS

Generate speech in Python

Load the model and run inference on a list of texts, then save each result as a WAV file at 24000 Hz.

pythonpython

import ChatTTS
import torch
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance

texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]

wavs = chat.infer(texts)

for i in range(len(wavs)):
    torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)

Try the command line or WebUI

From the project root you can also infer directly from the command line, which saves audio to ./output_audio_n.mp3, or launch the bundled web interface.

bashbash

python examples/cmd/run.py "Your text 1." "Your text 2."
python examples/web/webui.py

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Giving an LLM assistant or chatbot a natural, expressive spoken voice for dialogue
Research and academic experiments on prosody and conversational speech synthesis
Generating multi-speaker English and Chinese audio with controllable laughter, pauses, and intonation

How ChatTTS compares

ChatTTS alongside other open-source audio, music & voice tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Whisper	★ 103k	OpenAI's speech recognition model that transcribes and translates audio across many languages.
GPT-SoVITS	★ 58.9k	An open-source WebUI that clones a voice from a short audio sample and turns text into speech, with zero-shot and few-shot fine-tuning.
VibeVoice	★ 49.5k	Microsoft's text-to-speech model for generating long, expressive multi-speaker audio like podcasts.
Coqui TTS	★ 45.6k	A library of text-to-speech models including the multilingual XTTS voice-cloning model.
ChatTTS	★ 39.5k	Generative text-to-speech model built for natural daily dialogue
MockingBird	★ 36.9k	An open-source PyTorch toolbox that clones a voice from a short sample and generates Mandarin Chinese speech, with a web app, desktop toolbox, and command line.
OpenVoice	★ 36.7k	OpenVoice clones a voice from a short reference clip and speaks in multiple languages, with control over emotion, accent, rhythm, and intonation.
VoxCPM	★ 31k	An open-source text-to-speech system that generates natural multilingual speech, designs voices from text descriptions, and clones any voice from a short clip.

// Overview

// What it does

// Getting started