Text Decoding
How LLMs generate text one token at a time by sampling from probability distributions.

The Creative Engine of AI
When you chat with an AI, it doesn't retrieve pre-written answers. Instead, it generates text one token at a time, each time predicting probabilities for what comes next. The decoding strategy determines whether the output is predictable or creative.
This is why the same prompt can give different answers — and why "temperature" settings exist.
Always picks the highest-probability token. Deterministic, precise. Best for code and translation.
Randomly samples from tokens whose cumulative probability reaches a threshold. Creative, varied output.
Controls randomness: low = focused, high = creative. A multiplier on the probability distribution.
Pick the Next Token
The model predicts probabilities for each possible next token. Click a bar to select it and continue generating:
Step 1/4: Pick the next token
Greedy vs. Sampling — Side by Side
Where Each Strategy Shines
Uses greedy/low temperature. You want deterministic, correct syntax — not creative variable names.
Uses top-p sampling with higher temperature. Diverse, interesting, sometimes surprising outputs.
Default temperature balances accuracy and naturalness. Users can adjust via API parameters.
Test Your Understanding
Q1.What does an LLM output at each generation step?
Q2.Which decoding strategy always picks the highest-probability token?
Q3.When would you use top-p sampling over greedy decoding?