Fooocus

Offline image generation that lets you focus on the prompt, not the settings

Overview

Fooocus is a free, offline, open-source app for generating images from text. It is built on the Stable Diffusion XL architecture and runs locally on your own machine. Like online tools such as Midjourney, it hides most of the technical settings so you can concentrate on writing prompts instead of tuning parameters.

It is aimed at people who want good-looking images without learning prompt engineering or sampler internals. The installation is short (the project keeps the clicks from download to first image under three), the minimum GPU requirement is 4GB of Nvidia VRAM, and the interface is a local Gradio web UI.

Within the image-generation space, Fooocus sits between bare model runners and heavy node-based tools like ComfyUI. The project is now in limited long-term support with bug fixes only and stays on SDXL; for newer models such as Flux the maintainers point to WebUI Forge or ComfyUI/SwarmUI.

What it does

Text-to-image with an offline GPT-2 based prompt processing engine, so short or very long prompts both produce usable results
Upscale and variation options (Vary Subtle/Strong, Upscale 1.5x/2x) to refine an input image
Inpaint and outpaint using Fooocus's own inpaint model, plus pan in any direction
Image Prompt and FaceSwap (via InsightFace) to guide generation from a reference image
Built-in styles and presets, including run_anime and run_realistic launchers and SDXL models from Civitai
Runs fully offline with a Gradio web UI and a 4GB minimum Nvidia GPU requirement

Getting started

On Windows you can use the prebuilt download; on Linux you clone the repo and run the entry script, which downloads the default models on first launch.

Windows: download and run

Download the prebuilt archive from the official GitHub release, uncompress it, and run run.bat. The first launch downloads the default models automatically.

Linux: clone and install (Anaconda)

Clone the repository, create the conda environment, and install the pinned requirements.

bashbash

git clone https://github.com/lllyasviel/Fooocus.git
cd Fooocus
conda env create -f environment.yaml
conda activate fooocus
pip install -r requirements_versions.txt

Launch Fooocus

Start the app with the entry script. It self-updates and opens the Gradio UI in your browser. Use --preset anime or --preset realistic for the other model presets.

bashbash

python entry_with_update.py

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Generate images from a text prompt locally without paying for an online service or sharing your prompts
Edit an existing image by inpainting, outpainting, or upscaling it
Create anime or photo-realistic styles using the dedicated presets and SDXL models from Civitai
Run a face swap or image-guided generation from a reference photo

How Fooocus compares

Fooocus alongside other open-source image generation tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Stable Diffusion web UI (AUTOMATIC1111)	★ 164k	A browser interface for running Stable Diffusion image generation locally with extensions and fine-grained controls.
ComfyUI	★ 118k	A node-based visual editor for building and running image and video generation pipelines like Stable Diffusion and FLUX locally.
Fooocus	★ 50.4k	Offline image generation that lets you focus on the prompt, not the settings
InvokeAI	★ 27.5k	A self-hosted creative tool and canvas for generating and editing images with open diffusion models.
Stability-AI generative-models	★ 27.2k	Stability AI's official code for its Stable Diffusion family of image and video generation models.
FLUX	★ 25.6k	Black Forest Labs' open-weight diffusion models and inference code for generating and editing images from text prompts.
Z-Image	★ 11.6k	Alibaba Tongyi's 6B-parameter open image model that produces photorealistic images quickly on a single GPU.
DALLE2-pytorch	★ 11.3k	An open implementation of DALL-E 2 in PyTorch, with the CLIP encoder, diffusion prior, and cascading decoder you train to generate images from text.

// Overview

// What it does

// Getting started