There is barely any field untouched with LLMs right now, and every day we hear about more and more about what amazing things they can do. From writing poetry to answering complex questions, it feels like there’s no limit to what they can do. But here’s something we don’t often talk about: how these models turn raw probabilities into the final words you see on your screen.

Decoding—the process of converting token probabilities into coherent responses—is a crucial piece of the puzzle. It’s not as simple as just picking the "most likely" next word. Different situations call for different strategies, depending on whether you want something fast, creative, or super-precise. And the best part? We’ve got a lot of clever ways to do this!

In this post, we’ll break it all down:

  1. Why decoding matters: It’s not just about efficiency; it’s about getting the right kind of output.
  2. The basic approaches: Greedy decoding, which is deterministic approach. You can add a dash of randomness with top-K sampling and nucleus sampling. And by now hearing the word ‘greedy’; LeetCode people must be dreading the next ‘tokens’: Dynamic Programming.
  3. The fancy stuff: Let’s ponder if LLMs can guide themselves for decoding?

Let’s dive in!


Why Decoding isn’t just afterthought

Okay, imagine this: You ask a model to generate a story, and it predicts a list of possible words for the next step, each with a probability. The easiest approach? Just pick the word with the highest probability every time. Done, right?

Not quite.

Depending on what you’re trying to do, this method might not always give you what you want. For instance:

This is why decoding strategies matter. They shape the final output, balancing things like speed, quality, and creativity. Let’s explore how this works!


1. Greedy Decoding: Keeping It Simple

First up is greedy decoding. This one’s pretty straightforward: at each step, the model picks the token with the highest probability, feed in back to the model to generate the next token: the auto-regressive way. Boom, you’re done.