18 Sequential Importance Sampling

Published

October 17, 2025

Neal (1998)

18.1 AIS Algorithm

We have unnormalised densities f_0, f_1, \dots f_n where f_0 is our target and f_n is our prior.

We have a set of n-1 Markov transition kernels T_{n-1} \dots T_1 that have F_{n-1}, \dots F_1 as their invariant distributions respectively (for example Metropolis-Hastings updates).

To generate a single point x^{(i)}:

Generate x_{n-1}^{(i)} \sim F_{n}
Generate x_{n-2}^{(i)} \sim T_{n-1}(x_{n-1}^{(i)} \rightarrow x_{n-2})…

Generate x_0^{(i)} \sim T_1(x_1^{(i)} \rightarrow x_0)

Assign x^{(i)} = x_0^{(i)}.

We’ve sampled this single point through a complicated Markov Chain that made use of kernels corresponding to the intermediate annealed distributions between our prior f_n and posterior f_0.

The importance weight assigned to x^{(i)} is: w^{(i)} = \frac{f_{n-1}\Big(x_{n-1}^{(i)}\Big)}{f_{n}\Big(x_{n-1}^{(i)}\Big)} \cdot \frac{f_{n-2}\Big(x_{n-2}^{(i)}\Big)}{f_{n-1}\Big(x_{n-2}^{(i)}\Big)} \dots \frac{f_{1}\Big(x_{1}^{(i)}\Big)}{f_{2}\Big(x_{1}^{(i)}\Big)} \cdot \frac{f_{0}\Big(x_{0}^{(i)}\Big)}{f_{1}\Big(x_{0}^{(i)}\Big)}

18.2 Derivation of AIS weights

Let’s take the simple case where we have the prior F_2, the posterior F_0 and one intermediate annealing distribution F_1. The transition kernel corresponding to F_1 is T_1.

To generate x^{(i)}:

Sample y^{(i)} \sim F_2.
Sample x^{(i)} \sim T_1( y^{(i)} \rightarrow x).

Consider the extended state space with points (x, y).

We can define an unnormalised target density for the entire state space as f(x, y) = f_0(x) \cdot \tilde T_1(x, y) This gives us the probability that we first sampled x from our target F_0 and then sampled y conditional on the value of x.

Here \tilde T_1 is the reversed transition kernel density: \tilde T_1(x, y) = T_1(y, x) \cdot \frac{f_1(y)}{f_1(x)}

Becasuse F_1 is the stationary distribution of T_1, \int f_1(y) \cdot T_1(y,x) dy = f_1(x)

This ensures our \tilde T_1 is a valid transition kernel: \int \tilde T_1(x, y) dy = \int T_1(y, x) \cdot \frac{f_1(y)}{f_1(x)} dy = 1

Similarly when we marginalise our joint density f(x,y) we just get f_0(x) exactly as desired.

Now that we have our target joint density and our actual joint density, we can reweigh the samples by the ratio of those two:

\begin{align*} w^{(i)} &= \frac{f(x^{(i)}, y^{(i)})}{f_2(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})} \\ &= \frac{f_0(x^{(i)}) \tilde T_1(x^{(i)}, y^{(i)})}{f_2(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})} \\ &= \frac{f_0(x^{(i)}) \cdot f_1(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})}{f_1(x^{(i)})} \cdot \frac{1}{f_2(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})}\\ &= \frac{f_0(x^{(i)})}{f_1(x^{(i)})} \cdot \frac{f_1(y^{(i)})}{f_2(y^{(i)})} \end{align*}

Neal, Radford M. 1998. “Annealed Importance Sampling.” https://arxiv.org/abs/physics/9803008.