AI/TLDR

Fooocus

Offline image generation that lets you focus on the prompt, not the settings

Overview

Fooocus is a free, offline, open-source app for generating images from text. It is built on the Stable Diffusion XL architecture and runs locally on your own machine. Like online tools such as Midjourney, it hides most of the technical settings so you can concentrate on writing prompts instead of tuning parameters.

It is aimed at people who want good-looking images without learning prompt engineering or sampler internals. The installation is short (the project keeps the clicks from download to first image under three), the minimum GPU requirement is 4GB of Nvidia VRAM, and the interface is a local Gradio web UI.

Within the image-generation space, Fooocus sits between bare model runners and heavy node-based tools like ComfyUI. The project is now in limited long-term support with bug fixes only and stays on SDXL; for newer models such as Flux the maintainers point to WebUI Forge or ComfyUI/SwarmUI.

What it does

  • Text-to-image with an offline GPT-2 based prompt processing engine, so short or very long prompts both produce usable results
  • Upscale and variation options (Vary Subtle/Strong, Upscale 1.5x/2x) to refine an input image
  • Inpaint and outpaint using Fooocus's own inpaint model, plus pan in any direction
  • Image Prompt and FaceSwap (via InsightFace) to guide generation from a reference image
  • Built-in styles and presets, including run_anime and run_realistic launchers and SDXL models from Civitai
  • Runs fully offline with a Gradio web UI and a 4GB minimum Nvidia GPU requirement

Getting started

On Windows you can use the prebuilt download; on Linux you clone the repo and run the entry script, which downloads the default models on first launch.

Windows: download and run

Download the prebuilt archive from the official GitHub release, uncompress it, and run run.bat. The first launch downloads the default models automatically.

Linux: clone and install (Anaconda)

Clone the repository, create the conda environment, and install the pinned requirements.

bashbash
git clone https://github.com/lllyasviel/Fooocus.git
cd Fooocus
conda env create -f environment.yaml
conda activate fooocus
pip install -r requirements_versions.txt

Launch Fooocus

Start the app with the entry script. It self-updates and opens the Gradio UI in your browser. Use --preset anime or --preset realistic for the other model presets.

bashbash
python entry_with_update.py

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Generate images from a text prompt locally without paying for an online service or sharing your prompts
  • Edit an existing image by inpainting, outpainting, or upscaling it
  • Create anime or photo-realistic styles using the dedicated presets and SDXL models from Civitai
  • Run a face swap or image-guided generation from a reference photo

How Fooocus compares

Fooocus alongside other open-source image generation tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Stable Diffusion web UI (AUTOMATIC1111)★ 164kA browser interface for running Stable Diffusion image generation locally with extensions and fine-grained controls.
ComfyUI★ 118kA node-based visual editor for building and running image and video generation pipelines like Stable Diffusion and FLUX locally.
Fooocus★ 50.4kOffline image generation that lets you focus on the prompt, not the settings
InvokeAI★ 27.5kA self-hosted creative tool and canvas for generating and editing images with open diffusion models.
Stability-AI generative-models★ 27.2kStability AI's official code for its Stable Diffusion family of image and video generation models.
FLUX★ 25.6kBlack Forest Labs' open-weight diffusion models and inference code for generating and editing images from text prompts.
Z-Image★ 11.6kAlibaba Tongyi's 6B-parameter open image model that produces photorealistic images quickly on a single GPU.
DALLE2-pytorch★ 11.3kAn open implementation of DALL-E 2 in PyTorch, with the CLIP encoder, diffusion prior, and cascading decoder you train to generate images from text.