Stability-AI generative-models

Stability AI's official code for the Stable Diffusion and Stable Video model family

github.com/Stability-AI/generative-models★ 27.2k stability.ai

Overview

generative-models is the official open-source codebase from Stability AI for its Stable Diffusion family. It collects the model code, sampling scripts, and configs for SDXL (text-to-image) and the Stable Video line for video and multi-view generation, including Stable Video Diffusion (SVD), SV3D, SV4D, and SV4D 2.0.

It is aimed at researchers and engineers who want to run or build on these models directly rather than through a hosted API. You download model weights from Hugging Face, drop them into a checkpoints/ folder, and call the sampling scripts in scripts/sampling/. The repo uses a config-driven design that separates samplers and guiders from the core diffusion models.

Within the image-generation category, it sits at the source: it is the reference implementation many other tools and pipelines wrap. If you need the actual Stability AI code and want control over the inference loop, this is where the models live.

What it does

SDXL text-to-image models (base, refiner, and SDXL-Turbo) with ready-to-run sampling scripts
Stable Video Diffusion (SVD / SVD-XT) for image-to-video synthesis
SV3D for novel-view and multi-view generation from a single image
SV4D and SV4D 2.0 video-to-4D models for novel-view video synthesis of moving objects
Config-driven architecture that separates samplers and guiders from the diffusion model code
Inference scripts with options for low-VRAM runs (--encoding_t, --decoding_t, smaller --img_size) and optional background removal via rembg

Getting started

Set up a Python 3.10 environment, install the dependencies, then download weights from Hugging Face and run a sampling script. The example below shows the SV4D 2.0 video-to-4D flow from the README.

Create the environment and install dependencies

Make a virtual environment, install a CUDA-matched PyTorch build, then install the project requirements and package.

bashbash

python3.10 -m venv .generativemodels
source .generativemodels/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # check CUDA version
pip3 install -r requirements/pt2.txt
pip3 install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Download model weights

Pull the checkpoint you want from Hugging Face into the checkpoints/ folder. This example fetches the SV4D 2.0 weights.

bashbash

huggingface-cli download stabilityai/sv4d2.0 sv4d2.safetensors --local-dir checkpoints

Run a sampling script

Call the matching script in scripts/sampling/ with your input. This runs SV4D 2.0 on an example video and writes results to outputs/.

bashbash

python scripts/sampling/simple_video_sample_4d2.py --input_path assets/sv4d_videos/camel.gif --output_folder outputs

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Generate images locally with SDXL or SDXL-Turbo using the provided text-to-image sampling scripts
Turn a single still image into a short video clip with Stable Video Diffusion
Produce novel-view or multi-view renders of an object with SV3D for 3D research
Run video-to-4D synthesis of a moving object with SV4D / SV4D 2.0 for novel-view video research

How Stability-AI generative-models compares

Stability-AI generative-models alongside other open-source image generation tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Stable Diffusion web UI (AUTOMATIC1111)	★ 164k	A browser interface for running Stable Diffusion image generation locally with extensions and fine-grained controls.
ComfyUI	★ 118k	A node-based visual editor for building and running image and video generation pipelines like Stable Diffusion and FLUX locally.
Fooocus	★ 50.4k	A simplified image generation app built on Stable Diffusion that hides technical settings for easy prompting.
InvokeAI	★ 27.5k	A self-hosted creative tool and canvas for generating and editing images with open diffusion models.
Stability-AI generative-models	★ 27.2k	Stability AI's official code for the Stable Diffusion and Stable Video model family
FLUX	★ 25.6k	Black Forest Labs' open-weight diffusion models and inference code for generating and editing images from text prompts.
Z-Image	★ 11.6k	Alibaba Tongyi's 6B-parameter open image model that produces photorealistic images quickly on a single GPU.
DALLE2-pytorch	★ 11.3k	An open implementation of DALL-E 2 in PyTorch, with the CLIP encoder, diffusion prior, and cascading decoder you train to generate images from text.

// Overview

// What it does

// Getting started

Create the environment and install dependencies

Download model weights

Run a sampling script

// When to use it

// How Stability-AI generative-models compares

Overview

What it does

Getting started

When to use it

How Stability-AI generative-models compares