AI/TLDR

Stability-AI generative-models

Stability AI's official code for the Stable Diffusion and Stable Video model family

Overview

generative-models is the official open-source codebase from Stability AI for its Stable Diffusion family. It collects the model code, sampling scripts, and configs for SDXL (text-to-image) and the Stable Video line for video and multi-view generation, including Stable Video Diffusion (SVD), SV3D, SV4D, and SV4D 2.0.

It is aimed at researchers and engineers who want to run or build on these models directly rather than through a hosted API. You download model weights from Hugging Face, drop them into a checkpoints/ folder, and call the sampling scripts in scripts/sampling/. The repo uses a config-driven design that separates samplers and guiders from the core diffusion models.

Within the image-generation category, it sits at the source: it is the reference implementation many other tools and pipelines wrap. If you need the actual Stability AI code and want control over the inference loop, this is where the models live.

What it does

  • SDXL text-to-image models (base, refiner, and SDXL-Turbo) with ready-to-run sampling scripts
  • Stable Video Diffusion (SVD / SVD-XT) for image-to-video synthesis
  • SV3D for novel-view and multi-view generation from a single image
  • SV4D and SV4D 2.0 video-to-4D models for novel-view video synthesis of moving objects
  • Config-driven architecture that separates samplers and guiders from the diffusion model code
  • Inference scripts with options for low-VRAM runs (--encoding_t, --decoding_t, smaller --img_size) and optional background removal via rembg

Getting started

Set up a Python 3.10 environment, install the dependencies, then download weights from Hugging Face and run a sampling script. The example below shows the SV4D 2.0 video-to-4D flow from the README.

Create the environment and install dependencies

Make a virtual environment, install a CUDA-matched PyTorch build, then install the project requirements and package.

bashbash
python3.10 -m venv .generativemodels
source .generativemodels/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # check CUDA version
pip3 install -r requirements/pt2.txt
pip3 install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Download model weights

Pull the checkpoint you want from Hugging Face into the checkpoints/ folder. This example fetches the SV4D 2.0 weights.

bashbash
huggingface-cli download stabilityai/sv4d2.0 sv4d2.safetensors --local-dir checkpoints

Run a sampling script

Call the matching script in scripts/sampling/ with your input. This runs SV4D 2.0 on an example video and writes results to outputs/.

bashbash
python scripts/sampling/simple_video_sample_4d2.py --input_path assets/sv4d_videos/camel.gif --output_folder outputs

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Generate images locally with SDXL or SDXL-Turbo using the provided text-to-image sampling scripts
  • Turn a single still image into a short video clip with Stable Video Diffusion
  • Produce novel-view or multi-view renders of an object with SV3D for 3D research
  • Run video-to-4D synthesis of a moving object with SV4D / SV4D 2.0 for novel-view video research

How Stability-AI generative-models compares

Stability-AI generative-models alongside other open-source image generation tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Stable Diffusion web UI (AUTOMATIC1111)★ 164kA browser interface for running Stable Diffusion image generation locally with extensions and fine-grained controls.
ComfyUI★ 118kA node-based visual editor for building and running image and video generation pipelines like Stable Diffusion and FLUX locally.
Fooocus★ 50.4kA simplified image generation app built on Stable Diffusion that hides technical settings for easy prompting.
InvokeAI★ 27.5kA self-hosted creative tool and canvas for generating and editing images with open diffusion models.
Stability-AI generative-models★ 27.2kStability AI's official code for the Stable Diffusion and Stable Video model family
FLUX★ 25.6kBlack Forest Labs' open-weight diffusion models and inference code for generating and editing images from text prompts.
Z-Image★ 11.6kAlibaba Tongyi's 6B-parameter open image model that produces photorealistic images quickly on a single GPU.
DALLE2-pytorch★ 11.3kAn open implementation of DALL-E 2 in PyTorch, with the CLIP encoder, diffusion prior, and cascading decoder you train to generate images from text.