Variational Autoencoder
A generative model that learns compressed representations for creating new data.

Compression as Intelligence
A VAE learns to compress data into a small "latent space" and then reconstruct it. The magic: you can sample new points from latent space to generate entirely new data that looks like the training data but never existed.
VAEs are the backbone of video generation models like OpenAI's Sora, where they compress video frames into a manageable latent space for the diffusion model to work with.
Compresses input into a latent vector (mean + standard deviation). Maps data to a compact representation.
The compressed representation. Nearby points produce similar outputs. You can interpolate between them.
Reconstructs data from the latent vector. Can generate new data by decoding random latent points.
Explore the Latent Space
Drag the slider to move through latent space. Watch how the encoded representation and decoded output change smoothly:
Latent position: 50%
In a real VAE, this slider would smoothly morph between generated faces, digits, or other data types.
VAE Architecture
VAEs in the Wild
Uses a VAE to compress video frames into latent space before applying diffusion for generation.
The VAE encoder compresses images to latent space; the VAE decoder converts back to pixel space.
VAEs generate novel molecular structures by sampling from a learned chemical latent space.
Test Your Understanding
Q1.What are the two main components of a VAE?
Q2.What does the latent space represent?
Q3.How is a VAE used in OpenAI's Sora?