11 Exponential family
11.1 Definition
The exponential family is a group of distributions whose probability density or mass functions are of the form: p( x | \eta ) = h(x) \exp \left( \eta^\top t(x) - a(\eta) \right)
Here, t(x) is the sufficient statistic, which is a vector of functions of x, \eta is the “natural parameter” of the exponential family, a(\eta) is the log normalizer and h(x) is the base measure.
11.1.1 Example: Poisson distribution
For example, consider the Bernoulli distribution \begin{align*} p(x | \pi) &= \pi ^ x (1 - \pi)^{1-x} \\ &= \exp \left( x \log \pi + (1-x) \log (1- \pi) \right) \\ &= \exp \left( ( \log \pi - \log (1- \pi)) x + \log (1- \pi) \right) \\ &= \exp \left( x \log \frac \pi {1- \pi} + \log (1- \pi) \right) \\ \end{align*}
Plugging everything into the exponential family template, h(x) = 1, t(x) = x, \eta = \log \frac \pi {1- \pi}, and a(\eta) = -\log (1 - \pi). Notice that the natural parameter here is log-odds of success.
11.1.2 Example: Gaussian distribution
Another example is the Gaussian distribution: \begin{align*} p(x | \mu, \sigma^2) &= \frac{1}{\sqrt {2 \pi \sigma^2}} \exp \left( \frac{- (x- \mu)^2}{2 \sigma^2} \right) \\ &= \frac{1}{\sqrt {2 \pi \sigma^2}} \exp \left( \frac{- x^2 + 2 \mu x - \mu^2}{2 \sigma^2} \right) \\ \end{align*}
Here, t(x) = \begin{bmatrix} x^2 \\ x \end{bmatrix}, h(x) = \frac{1}{\sqrt 2 \pi}, \eta = \begin{bmatrix} \frac{-1}{2\sigma^2}\\ \frac{\mu}{\sigma^2} \end{bmatrix}, and a(\eta) = \frac{1}{2} \log \sigma^2 + \frac{\mu^2}{\sigma^2}
11.2 Moments of exponential family distributions
An interesting property of exponential family distributions is that the ith order derivative of the log normaliser a(\eta) gives us the ith order moment of the sufficient statistics t(x).
\begin{align*} \nabla_\eta a(\eta) &= \nabla_\eta \left(\log \int h(x) \exp (\eta^\top t(x)) dx\right)\\ &= \frac{\nabla_\eta \int h(x) \exp (\eta^\top t(x)) dx }{\int h(x) \exp (\eta^\top t(x)) dx } \\ &= \int t(x) \frac{ h(x) \exp (\eta^\top t(x)) }{\int h(x) \exp (\eta^\top t(x)) dx }dx \\ &= \mathbb E _{X\sim f(\cdot|\eta)}[t(X)] \end{align*}