Metropolis–Hastings MCMC

A scroll-through intuition for how a random walk can sample a distribution you can’t directly compute.

Imagine you need to understand a probability distribution, but you can’t write it down in a neat closed form. You can’t integrate it. You can’t even normalize it. All you can do is ask: “How tall is it here, relative to over there?”

Markov Chain Monte Carlo (MCMC) is the trick for that situation. You don’t try to compute the whole distribution. Instead, you generate samples from it. If you can get good samples, you can estimate means, uncertainties, and probabilities just by counting.

The mental model I like is: a little “walker” wanders around a landscape. Where the landscape is high, the walker tends to linger. Where it’s low, the walker moves through quickly. Over time, the amount of time spent in each region becomes proportional to the distribution’s height there.

The algorithm in four steps

Choose a starting position at random.
Propose a jump to a nearby position.
If the new position is more probable, accept the jump.
If it’s less probable, still maybe accept — with probability proportional to the ratio of probabilities.

That last step is the entire magic. It’s what lets the walker move between peaks, escape local traps, and still spend the right amount of time in the right places.

Let’s watch it work.

0 samples · mean: — · true mean: 428.6 · acceptance: 0%

As the walk runs, the histogram of visited positions starts to resemble the target distribution. The walker is spending more time where the distribution is high, and less time where it’s low — without ever needing to compute the full distribution everywhere.

Meanwhile, simple estimates (like the running mean of the samples) start to stabilize. That’s the practical value: once you have samples, many questions reduce to arithmetic.

The tuning problem

There’s one important knob: how far should each jump be? This is usually called the proposal width.

If proposals are too cautious, the walker accepts almost everything — but it takes forever to explore the distribution. If proposals are too ambitious, most jumps land in low-probability regions and get rejected — so you waste steps standing still.

The sweet spot is somewhere in the middle: big enough to move, not so big you constantly bounce off the landscape.

Try it yourself

Here are three presets for proposal width. Each one resets the figure and scrolls it into view so you can immediately see the difference.

I wrote the original notebook for this demo about fifteen years ago while learning data science. The algorithm is simple, but the intuition — wandering around a landscape, spending time proportional to height — is one of the most useful ideas in computational statistics. View the original notebook on GitHub.