Text Decoding

How LLMs generate text one token at a time by sampling from probability distributions.

Why It Matters

The Creative Engine of AI

When you chat with an AI, it doesn't retrieve pre-written answers. Instead, it generates text one token at a time, each time predicting probabilities for what comes next. The decoding strategy determines whether the output is predictable or creative.

This is why the same prompt can give different answers — and why "temperature" settings exist.

Greedy Decoding

Always picks the highest-probability token. Deterministic, precise. Best for code and translation.

Top-p Sampling

Randomly samples from tokens whose cumulative probability reaches a threshold. Creative, varied output.

Temperature

Controls randomness: low = focused, high = creative. A multiplier on the probability distribution.

Interactive

Pick the Next Token

The model predicts probabilities for each possible next token. Click a bar to select it and continue generating:

The cat |

Step 1/4: Pick the next token

Deep Dive

Greedy vs. Sampling — Side by Side

In Practice

Where Each Strategy Shines

Code Generation

Uses greedy/low temperature. You want deterministic, correct syntax — not creative variable names.

Creative Writing

Uses top-p sampling with higher temperature. Diverse, interesting, sometimes surprising outputs.

ChatGPT / Claude

Default temperature balances accuracy and naturalness. Users can adjust via API parameters.

Knowledge Check

Test Your Understanding

Q1.What does an LLM output at each generation step?

Q2.Which decoding strategy always picks the highest-probability token?

Q3.When would you use top-p sampling over greedy decoding?