Diffusion Models
Generating images and video by learning to reverse a noise-adding process.

The Engine Behind AI Image Generation
Diffusion models power DALL-E, Stable Diffusion, Midjourney, and Sora. The core idea is elegantly simple: teach a model to remove noise, then start from pure noise and let it "denoise" its way to a coherent image.
It's like a sculptor chipping away at marble — starting from chaos and gradually revealing structure.
Progressively add Gaussian noise to real images over ~1000 timesteps until the image becomes pure noise.
Learn to reverse this: start from noise, predict and remove noise at each step, guided by a text prompt.
A text encoder (like CLIP) converts your prompt into a vector that steers the denoising process.
Step Through Denoising
Click "Denoise" to remove noise step by step and watch an image emerge from pure randomness:
Pure Noise (t=1000)
How Diffusion Works
Diffusion Models in the Wild
OpenAI's image generator. Diffusion model with improved text understanding via native caption training.
Open-source diffusion model that works in latent space (using a VAE encoder). Powers thousands of apps.
Video generation: VAE compresses video frames, then a diffusion transformer generates coherent video in latent space.
Test Your Understanding
Q1.What does a diffusion model learn to do during training?
Q2.During inference, what does the model start from?
Q3.How does a text prompt influence the generated image?
Q4.What is the relationship between VAEs and diffusion models in Stable Diffusion?