What Is FLUX? Black Forest Labs' Image Model

You will understand what FLUX is, why it is considered the open-weight image-quality leader, and how a diffusion-transformer image model differs from earlier diffusion models.

INTERMEDIATE9 MIN READUPDATED 2026-06-14

OFFICIAL SITEbfl.ai OFFICIAL SITEstability.ai

In plain English

FLUX is a family of text-to-image AI models built by Black Forest Labs, a German startup founded by some of the original researchers behind Stable Diffusion. You type a description — "a red fox reading a newspaper in a snowy forest, golden hour" — and FLUX paints a matching image. It belongs to the same broad family as Stable Diffusion and Midjourney, but it is widely regarded as the quality leader among models you can download and run yourself.

FLUX — illustration — FLUX — aitools.aiting.com

Here is a useful analogy. Earlier open image models were like a talented painter who sometimes mishears your request — you ask for "a sign that says OPEN" and get a sign with garbled letters. FLUX is a painter who listens more carefully: it tends to follow long, detailed prompts closely, render readable text, and get hands and small details right more often. The leap is mostly about prompt adherence — doing what you actually asked — not just raw prettiness.

The name covers a lineup, not a single model. Some versions are open-weight (you can download the model file and run it on your own machine), while others are proprietary, available only through an API or hosted service. So "using FLUX" can mean running it locally for free or calling a paid endpoint, depending on which tier you pick.

Why it matters

FLUX matters because it narrowed a gap that used to feel permanent: for years, the prettiest image models (like the proprietary leaders) were closed, and the open models you could actually run and customize lagged behind on quality. FLUX brought near-top-tier quality into the open-weight world, which changes what an individual builder can do without renting someone else's black box.

What it unlocks for builders

Self-hosting. With an open-weight variant you can run generation on your own GPU or server. Your prompts and images never leave your machine — important for private, sensitive, or brand-controlled work.
Fine-tuning and LoRAs. Because the weights are available, you can teach an open FLUX variant a specific face, product, or art style using a small add-on called a LoRA, without retraining the whole model.
Tooling. Open weights mean a whole ecosystem — ControlNet, inpainting, image-to-image — can plug into it. Tools built for diffusion models largely work with FLUX too.
Cost control. Running open weights yourself means no per-image API fee. For high-volume generation, that can be the difference between a viable product and an unaffordable one.

Who should care? Anyone building an image feature — a marketing-asset generator, a product-mockup tool, a game-art pipeline, a photo editor — who wants strong quality and the freedom to host, fine-tune, and control it. If you only need occasional images and don't care where they run, a hosted generator may be simpler. The moment you need privacy, customization, or scale, an open-weight FLUX variant becomes very attractive.

How it works

At a high level FLUX does what every modern diffusion model does: it starts from pure random noise and removes noise step by step until an image appears, steered the whole way by your text prompt. If you have read what is a diffusion model, this is the same core loop. What is new is the engine doing the denoising.

The pipeline, end to end

Three components cooperate. A text encoder turns your prompt into numbers the model can act on. The denoiser — the heart of FLUX — repeatedly cleans up a noisy image in a compressed "latent" space, guided by those text numbers. Finally a decoder (VAE) expands the finished latent into the full-resolution picture you see.

// From prompt to picture

Promptyour textText encodertext → numbersDenoisernoise → latent, many stepsDecoder (VAE)latent → pixelsImagefinal output

The key change: a transformer, not a U-Net

Earlier diffusion models like Stable Diffusion used a U-Net as the denoiser — a convolution-based network shaped like the letter U. FLUX replaces it with a transformer, the same attention-based architecture behind large language models. A model built this way is often called a DiT (diffusion transformer). The practical payoff of attention is that the model can relate every part of the image to every word of the prompt at once, which is a big reason FLUX follows complex instructions and renders text so much better.

// What changed under the hood

Classic diffusion (U-Net)

Denoiser is a convolutional U-Net
Strong on local texture
Weaker at long, complex prompts
Often garbles text in images

FLUX (diffusion transformer)

Denoiser is a transformer (DiT)
Attention links words to all regions
Strong prompt adherence
Renders readable text far better

Everything else feels familiar. You still set the number of denoising steps (more steps, more refinement, more time) and a guidance strength (how hard the model sticks to your prompt versus inventing freely). You still write prompts and can use negative prompts or image-to-image where the variant supports them. The architecture changed; the dials you turn mostly did not.

Open-weight vs proprietary tiers

The single most confusing thing about FLUX for newcomers is that it is not one product. The lineup deliberately mixes tiers so it can serve both self-hosters and people who just want an API. Knowing which kind you are dealing with saves a lot of confusion.

Question	Open-weight variant	Proprietary variant
Where does it run?	Your own GPU, server, or a host you choose	Black Forest Labs' API or a partner platform
Can you fine-tune it?	Yes — LoRAs and full fine-tunes are possible	No — you only send prompts and get images
What does it cost?	Hardware + electricity; no per-image fee	Pay per image or per API call
Privacy?	Data stays on your infrastructure	Prompts and images pass through the provider
Best for	Customization, scale, private data	Top quality with zero setup

Within the open side, the lineup has historically split into a smaller, fast variant tuned for speed, a higher-quality variant for the best open results, and full-precision variants meant for fine-tuning. The exact names and the current generation evolve over time, so treat "FLUX" as a family and check Black Forest Labs' site for which specific variant is open versus API-only before you commit.

FLUX vs Stable Diffusion: which to reach for

FLUX and Stable Diffusion are the two pillars of the open image world, and they are siblings — many FLUX researchers helped create Stable Diffusion. They are not strictly rivals so much as different points on a tradeoff curve.

How they differ in practice

Quality and prompt adherence. FLUX generally leads on following detailed prompts, rendering legible text, and getting fine details right. For demanding, instruction-heavy prompts it is usually the stronger choice.
Hardware appetite. FLUX models are large, so they want a capable GPU with plenty of VRAM. Stable Diffusion variants — especially the smaller ones — run on more modest hardware, which still matters if you are generating locally on a laptop.
Ecosystem maturity. Stable Diffusion has been around longer, so it has an enormous back-catalog of community fine-tunes, LoRAs, and niche tools. FLUX's ecosystem grew fast and is rich, but Stable Diffusion's is broader and older.
Speed. Smaller Stable Diffusion variants and FLUX's fast tier can be quicker per image; the highest-quality FLUX variant trades some speed for fidelity.

A reasonable default: reach for FLUX when output quality and prompt fidelity are the priority and you have the GPU for it; reach for a Stable Diffusion variant when you need lighter hardware, the deepest pool of existing community models, or maximum speed. Many practitioners keep both installed and pick per task. Both run inside the same tools, so switching is often just loading a different model file.

Going deeper

Once the basics click, a few directions are worth exploring.

Editing and control, not just generation. Beyond text-to-image, FLUX variants support editing-style workflows: inpainting and outpainting to change or extend part of an image, image-to-image to transform an existing picture, and structural conditioning in the spirit of ControlNet to lock pose or layout. These turn FLUX from a one-shot generator into a controllable image engine.

Why transformers won here too. It is no accident that image and language models converged on the same architecture. Attention scales well with data and compute, and it lets a single mechanism handle long, structured relationships — between words, between image regions, and between the two. The move from U-Nets to diffusion transformers is part of a broader trend of transformers becoming the default backbone across modalities. If you want the deeper contrast, see diffusion vs autoregressive.

Running it well. Because the high-quality variants are heavy, real-world use involves practical tricks: using lower-precision (quantized) weights to fit smaller GPUs, choosing the fast variant when latency matters, and tuning step count and guidance per task. Node-based tools that wire up the model, sampler, and add-ons (like ComfyUI) are the common way power users orchestrate this.

Honest limitations. FLUX is excellent but not magic. Like all diffusion models it can still produce artifacts, struggle with very long passages of text or precise counts, and reflect biases in its training data. Open weights also raise misuse and provenance questions the whole field is still working through. And because the lineup keeps evolving — with the current generation and the open-versus-proprietary split both shifting over time — the durable skill is understanding the concepts here, then checking Black Forest Labs for which exact variant fits your needs today.

FAQ

What is FLUX in AI?

FLUX is a family of text-to-image AI models from Black Forest Labs, the startup founded by several of the original Stable Diffusion researchers. You give it a text prompt and it generates a matching image. It is widely seen as the quality leader among image models you can download and run yourself.

Is FLUX open source or proprietary?

Both, depending on the variant. The lineup deliberately mixes open-weight versions you can download and self-host with proprietary versions available only through an API. Always check which tier a specific variant belongs to, and read its license before commercial use.

Is FLUX better than Stable Diffusion?

On prompt adherence, text rendering, and fine detail, FLUX generally leads. But Stable Diffusion runs on lighter hardware and has a larger, older catalog of community fine-tunes and tools. Many people keep both and choose per task rather than treating one as strictly better.

What is a diffusion transformer?

A diffusion transformer (often called a DiT) is a diffusion image model whose denoiser is a transformer — the attention-based architecture behind large language models — instead of the older convolutional U-Net. Attention lets the model relate every word of the prompt to every region of the image, which improves how closely it follows instructions.

Can I run FLUX on my own computer?

Yes, if you use an open-weight variant and have a capable GPU with enough VRAM. The high-quality variants are large, so people often use a faster variant or lower-precision (quantized) weights to fit smaller cards. Proprietary variants run only on the provider's servers via API.

Who makes FLUX?

Black Forest Labs, a German AI lab founded by researchers who previously worked on Stable Diffusion. Its official site is bfl.ai, which is the best place to confirm which variants are currently open-weight versus API-only.

// In plain English

// Why it matters

What it unlocks for builders

// How it works

The pipeline, end to end

The key change: a transformer, not a U-Net

// Open-weight vs proprietary tiers

// FLUX vs Stable Diffusion: which to reach for

How they differ in practice

// Going deeper

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

Open-weight vs proprietary tiers

FLUX vs Stable Diffusion: which to reach for

Going deeper

FAQ

Further reading

Related