Fish Speech

Multilingual text-to-speech and voice cloning with inline emotion control

github.com/fishaudio/fish-speech★ 30.9k fish.audio

Overview

Fish Speech is an open text-to-speech and voice-cloning system from Fish Audio. The current model, Fish Audio S2 Pro, is a 4B-parameter model trained on a large multilingual audio set and covering more than 80 languages, so it can read text aloud and copy the voice in a short reference clip.

It is built for developers and researchers who want to add speech to their own apps without calling a hosted API. You can run it from the command line, through a Gradio WebUI, or as an API server, and the model weights are published on HuggingFace.

Within the speech and audio space, Fish Speech stands out for fine-grained control: you embed natural-language tags like [whisper], [excited], or [angry] directly in the text to shape prosody and emotion, and it can handle multi-speaker, multi-turn dialogue.

What it does

Multilingual TTS covering more than 80 languages, trained on a large audio corpus
Voice cloning from a short reference clip plus its matching transcript
Inline emotion and prosody tags such as [whisper], [excited], [pause], and [laughing]
Multi-speaker and multi-turn conversation generation
Run it your way: command-line inference, a Gradio WebUI, or an API server
Docker Compose profiles for WebUI and server, including CPU-only and AMD ROCm setups

Getting started

Fish Speech runs on Linux or WSL and recommends a GPU with about 24GB of memory. The simplest path is a conda environment; the project also publishes Docker Compose profiles.

Install system prerequisites

Install the audio libraries the build depends on.

bashbash

apt install portaudio19-dev libsox-dev ffmpeg

Create the environment and install

Set up Python 3.12 and install the package with the CUDA extra (use cu126/cu128 for other CUDA versions, or .[cpu] for CPU-only).

bashbash

conda create -n fish-speech python=3.12
conda activate fish-speech
pip install -e .[cu129]

Launch the WebUI

Start the Gradio interface to generate speech in the browser, or use Docker Compose instead.

bashbash

python tools/run_webui.py

Run a quick start with Docker

If you prefer containers, bring up the WebUI profile with Docker Compose.

bashbash

docker compose --profile webui up

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Add natural-sounding narration or in-app voice to a product without using a hosted TTS API
Clone a specific voice from a short sample to keep a consistent character or brand voice
Generate expressive, emotion-tagged dialogue for games, audiobooks, or video
Produce multilingual voiceovers across the 80+ supported languages

How Fish Speech compares

Fish Speech alongside other open-source audio, music & voice tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Whisper	★ 103k	OpenAI's speech recognition model that transcribes and translates audio across many languages.
GPT-SoVITS	★ 58.9k	An open-source WebUI that clones a voice from a short audio sample and turns text into speech, with zero-shot and few-shot fine-tuning.
VibeVoice	★ 49.5k	Microsoft's text-to-speech model for generating long, expressive multi-speaker audio like podcasts.
Coqui TTS	★ 45.6k	A library of text-to-speech models including the multilingual XTTS voice-cloning model.
ChatTTS	★ 39.5k	ChatTTS is an open-source text-to-speech model tuned for dialogue, with multi-speaker support and fine-grained control over laughter, pauses, and prosody.
MockingBird	★ 36.9k	An open-source PyTorch toolbox that clones a voice from a short sample and generates Mandarin Chinese speech, with a web app, desktop toolbox, and command line.
OpenVoice	★ 36.7k	OpenVoice clones a voice from a short reference clip and speaks in multiple languages, with control over emotion, accent, rhythm, and intonation.
Fish Speech	★ 30.9k	Multilingual text-to-speech and voice cloning with inline emotion control

// Overview

// What it does

// Getting started

Install system prerequisites

Create the environment and install

Launch the WebUI

Run a quick start with Docker

// When to use it

// How Fish Speech compares

Overview

What it does

Getting started

When to use it

How Fish Speech compares