15 Distribution estimation and bootstrap
15.1 Estimation of the CDF
F(x) = \mathbb P(X \leq x)
\hat F_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbb 1(X_i \leq x)
15.1.1 Pointwise convergence
Notice that for any fixed x, our estimator is a mean of IID Bernoulli(F(x)) random variables. Thus \mathbb E [\hat F_n(x)] = F(x) and Var[\hat F_n(x)] =\frac{1}{n} F(x)(1-F(X)). The MSE goes to zero in the limit, giving us pointwise convergence in quadratic mean and thus in probability.
We can also get a CLT result.
15.1.2 Glivenko Cantelli theorem
This result gives us uniform convergence. Let IID X_1, \dots X_n \sim F. Then:
\sup_{x \in \mathbb R} |\hat F_n(x) - F(x) | \xrightarrow{a.s.} 0
15.1.3 Dvoretzky-Kiefer-Wolfowitz inequality
For any \epsilon > 0, we get a uniform bound on the probability of the empirical CDF’s deviation:
\mathbb P \left( \sup_{x \in \mathbb R} |\hat F_n(x) - F(x)| > \epsilon \right) \leq 2e^{-2n \epsilon^2}
We can use this to obtain a confidence band. To achieve a coveregae of at least 1-\alpha:
\begin{align*} \alpha &= 2e^{-2n \epsilon_n^2}\\ \log \frac{2}{\alpha} &= 2n \epsilon_n^2\\ \epsilon_n &= \sqrt{\frac{1}{2n} \log \frac{2}{\alpha} }\\ \end{align*}
Then, F(x) \in (\max \{ \hat F_n(x) - \epsilon_n, 0\}, \min \{ \hat F_n(x) + \epsilon_n, 1\} ) with probability 1-\alpha.
15.2 Statistical functionals
A statistical functional is a map that takes a CDF and outputs a number. Many statistical quantities can be represented in the form of statistical functionals.
15.2.1 Linear functionals
Any functional that, for some function a(x) can be expressed in the form:
T[F] = \int a(x) dF(x)
Basically any form of expectation is a linear functional. The plug-in estimator using the empirical CDF to estimate any linear functional is simply the mean of transformed RVs:
T[\hat F_n] = \frac{1}{n} \sum_{i=1}^n a(X_i)
15.2.2 Examples
15.2.2.1 Moments
T_{k}[F] = \int x^k dF(x)
15.2.2.2 Quantile
Not a linear functional. T_q[F] = \inf_x \{x: F(x) \geq q \}
15.2.3 Influence function
The Gateaux derivative of a functional T at F, in the direction of \delta_x:
L_F(x) = \lim_{\epsilon \to 0} \frac{ T[ (1- \epsilon) F + \epsilon \delta_x ] - T[F] }{\epsilon}
The empirical influence function is thus:
\hat L(x) = \lim_{\epsilon \to 0} \frac{ T[ (1- \epsilon) \hat F_n + \epsilon \delta_x ] - T[\hat F_n] }{\epsilon}
For a linear functional T[F] = \int a(x) dF(x), this satisfies:
L_F(x) = a(x) - T[F] and \hat L(x) = a(x) - T[\hat F_n]
\int L_F(x) dF(x) = 0
For any other CDF G, the Gateux derivative L_F[G] = T[G] - T[F]= \int L_F(x) dG(x).
15.2.4 Functional delta method
A pointwise asymptotic 1-\alpha confidence interval for T[F] is:
T[\hat F_n] \pm z_{1-\alpha/2} \frac{\hat \tau}{\sqrt n}
where \hat \tau^2 = \frac{1}{n} \sum_i \hat L^2(X_i)
Derivation
Take G = \hat F_n.
Using (3), we get T[\hat F_n] - T[F] = \int L_F(x) d \hat F_n(x) = \frac{1}{n} \sum L_F(X_i).
By the law of large numbers, this converges in probability to zero: T[\hat F_n] - T[F] = \frac{1}{n}\sum_{i=1}^n L_F(X_i) \xrightarrow{p} \int L_F(x) dF(x) = 0
Then, we can apply the CLT to the mean expression: \sqrt{n} \left(\frac{1}{n} \sum L_F(X_i) - 0\right) = \sqrt{n} (T[F] - T[\hat F_n]) \xrightarrow{d} \mathcal N(0, \tau^2)
where \tau^2 = \int L_F^2 (x) dF(x) =\int (a(x) - T[F])^2 dF(x) < \infty
We estimate this using \hat \tau^2 =\frac{1}{n} \sum_{i=1}^n \hat L^2(X_i) = \sum_{i=1}^n (a(X_i) - T[\hat F_n])^2
By the law of large numbers \hat \tau^2 \xrightarrow{p} \tau^2. Applying Slutsky’s theorem, we get that:
\frac{(T[F] - T[\hat F_n])}{\hat \tau/\sqrt{n} } \xrightarrow{d} \mathcal N(0, 1)