18 Sequential Importance Sampling
Neal (1998)
18.1 AIS Algorithm
We have unnormalised densities f_0, f_1, \dots f_n where f_0 is our target and f_n is our prior.
We have a set of n-1 Markov transition kernels T_{n-1} \dots T_1 that have F_{n-1}, \dots F_1 as their invariant distributions respectively (for example Metropolis-Hastings updates).
To generate a single point x^{(i)}:
- Generate x_{n-1}^{(i)} \sim F_{n}
- Generate x_{n-2}^{(i)} \sim T_{n-1}(x_{n-1}^{(i)} \rightarrow x_{n-2})…
- Generate x_0^{(i)} \sim T_1(x_1^{(i)} \rightarrow x_0)
Assign x^{(i)} = x_0^{(i)}.
We’ve sampled this single point through a complicated Markov Chain that made use of kernels corresponding to the intermediate annealed distributions between our prior f_n and posterior f_0.
The importance weight assigned to x^{(i)} is: w^{(i)} = \frac{f_{n-1}\Big(x_{n-1}^{(i)}\Big)}{f_{n}\Big(x_{n-1}^{(i)}\Big)} \cdot \frac{f_{n-2}\Big(x_{n-2}^{(i)}\Big)}{f_{n-1}\Big(x_{n-2}^{(i)}\Big)} \dots \frac{f_{1}\Big(x_{1}^{(i)}\Big)}{f_{2}\Big(x_{1}^{(i)}\Big)} \cdot \frac{f_{0}\Big(x_{0}^{(i)}\Big)}{f_{1}\Big(x_{0}^{(i)}\Big)}
18.2 Derivation of AIS weights
Let’s take the simple case where we have the prior F_2, the posterior F_0 and one intermediate annealing distribution F_1. The transition kernel corresponding to F_1 is T_1.
To generate x^{(i)}:
- Sample y^{(i)} \sim F_2.
- Sample x^{(i)} \sim T_1( y^{(i)} \rightarrow x).
Consider the extended state space with points (x, y).
We can define an unnormalised target density for the entire state space as f(x, y) = f_0(x) \cdot \tilde T_1(x, y) This gives us the probability that we first sampled x from our target F_0 and then sampled y conditional on the value of x.
Here \tilde T_1 is the reversed transition kernel density: \tilde T_1(x, y) = T_1(y, x) \cdot \frac{f_1(y)}{f_1(x)}
Becasuse F_1 is the stationary distribution of T_1, \int f_1(y) \cdot T_1(y,x) dy = f_1(x)
This ensures our \tilde T_1 is a valid transition kernel: \int \tilde T_1(x, y) dy = \int T_1(y, x) \cdot \frac{f_1(y)}{f_1(x)} dy = 1
Similarly when we marginalise our joint density f(x,y) we just get f_0(x) exactly as desired.
Now that we have our target joint density and our actual joint density, we can reweigh the samples by the ratio of those two:
\begin{align*} w^{(i)} &= \frac{f(x^{(i)}, y^{(i)})}{f_2(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})} \\ &= \frac{f_0(x^{(i)}) \tilde T_1(x^{(i)}, y^{(i)})}{f_2(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})} \\ &= \frac{f_0(x^{(i)}) \cdot f_1(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})}{f_1(x^{(i)})} \cdot \frac{1}{f_2(y^{(i)}) \cdot T_1(y^{(i)}, x^{(i)})}\\ &= \frac{f_0(x^{(i)})}{f_1(x^{(i)})} \cdot \frac{f_1(y^{(i)})}{f_2(y^{(i)})} \end{align*}