Stratified Sampling

Consider the task of computing the multidimensional integral (e.g., an ensemble average),
$\displaystyle \langle A \rangle = \int d \xi f(\xi).$ (358)


The stratified sampling technique breaks the integration range into the union of $ k$ disjoint subregions $ D_1, D_2, ..., D_k$, so that within each subregion the integrand is relatively constant. Then, we can sample $ m_j$ random configurations $ \xi_j(1), \xi_j(2), ..., \xi_j(m_j)$ in the subregion $ D_j$ and approximate each subregional integral by

$\displaystyle \int_{D_j} d \xi f(\xi) \approx A_j = \frac{1}{m_j}[f(\xi_j(1))+f(\xi_j(2))+ ...+f(\xi_j(m_j))].$ (359)


The overall integral is computed as

$\displaystyle \langle A \rangle \approx \bar{A} = A_1 + A_2 + ... + A_k,$ (360)


whose variance is

$\displaystyle \sigma^2 = \frac{\sigma_1^2}{m_1} + \frac{\sigma_2^2}{m_2} + ... + \frac{\sigma_k^2}{m_k},$ (361)


where $ \sigma_j^2$ indicates the variation of the integrand in the subregion $ D_j$.

Note that only when the integrand is relatively constant within each subregion the variance introduced by Eq. (361) will be smaller than the variance of the estimator obtained by using a single region for the whole integration range, $ \tilde{\sigma}^2/m$ where $ m=m_1+m_2+...+m_k$ and $ \tilde{\sigma}$ is the overall variation of the integrand in the whole integration range.

If we look carefully we can see that the stratified sampling technique described in this section is a particular version of the importance sampling method.