Overview
Fooocus is a free, offline, open-source app for generating images from text. It is built on the Stable Diffusion XL architecture and runs locally on your own machine. Like online tools such as Midjourney, it hides most of the technical settings so you can concentrate on writing prompts instead of tuning parameters.
It is aimed at people who want good-looking images without learning prompt engineering or sampler internals. The installation is short (the project keeps the clicks from download to first image under three), the minimum GPU requirement is 4GB of Nvidia VRAM, and the interface is a local Gradio web UI.
Within the image-generation space, Fooocus sits between bare model runners and heavy node-based tools like ComfyUI. The project is now in limited long-term support with bug fixes only and stays on SDXL; for newer models such as Flux the maintainers point to WebUI Forge or ComfyUI/SwarmUI.
What it does
- Text-to-image with an offline GPT-2 based prompt processing engine, so short or very long prompts both produce usable results
- Upscale and variation options (Vary Subtle/Strong, Upscale 1.5x/2x) to refine an input image
- Inpaint and outpaint using Fooocus's own inpaint model, plus pan in any direction
- Image Prompt and FaceSwap (via InsightFace) to guide generation from a reference image
- Built-in styles and presets, including run_anime and run_realistic launchers and SDXL models from Civitai
- Runs fully offline with a Gradio web UI and a 4GB minimum Nvidia GPU requirement
Getting started
On Windows you can use the prebuilt download; on Linux you clone the repo and run the entry script, which downloads the default models on first launch.
Windows: download and run
Download the prebuilt archive from the official GitHub release, uncompress it, and run run.bat. The first launch downloads the default models automatically.
Linux: clone and install (Anaconda)
Clone the repository, create the conda environment, and install the pinned requirements.
git clone https://github.com/lllyasviel/Fooocus.git
cd Fooocus
conda env create -f environment.yaml
conda activate fooocus
pip install -r requirements_versions.txtLaunch Fooocus
Start the app with the entry script. It self-updates and opens the Gradio UI in your browser. Use --preset anime or --preset realistic for the other model presets.
python entry_with_update.pyCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Generate images from a text prompt locally without paying for an online service or sharing your prompts
- Edit an existing image by inpainting, outpainting, or upscaling it
- Create anime or photo-realistic styles using the dedicated presets and SDXL models from Civitai
- Run a face swap or image-guided generation from a reference photo
How Fooocus compares
Fooocus alongside other open-source image generation tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Stable Diffusion web UI (AUTOMATIC1111) | ★ 164k | A browser interface for running Stable Diffusion image generation locally with extensions and fine-grained controls. |
| ComfyUI | ★ 118k | A node-based visual editor for building and running image and video generation pipelines like Stable Diffusion and FLUX locally. |
| Fooocus | ★ 50.4k | Offline image generation that lets you focus on the prompt, not the settings |
| InvokeAI | ★ 27.5k | A self-hosted creative tool and canvas for generating and editing images with open diffusion models. |
| Stability-AI generative-models | ★ 27.2k | Stability AI's official code for its Stable Diffusion family of image and video generation models. |
| FLUX | ★ 25.6k | Black Forest Labs' open-weight diffusion models and inference code for generating and editing images from text prompts. |
| Z-Image | ★ 11.6k | Alibaba Tongyi's 6B-parameter open image model that produces photorealistic images quickly on a single GPU. |
| DALLE2-pytorch | ★ 11.3k | An open implementation of DALL-E 2 in PyTorch, with the CLIP encoder, diffusion prior, and cascading decoder you train to generate images from text. |