Variance of sample mean in an autocorrelated stochastic process

Let X be a stochastic process with mean \mathbb{E}(X) = \mu, and variance \mathbb{V}(X) = \sigma^2 . Let X_1,..., X_n be an observed time series of X.

A good estimator for \mu is \overline{X}  = \sum_{i = 1}^{n} X_i . We know that if the observations are IID, \mathbb{V}(\overline{X}) = \frac{\sigma^2}{n}. However, if the observations are not IID, the variance will be larger – this is what I learned recently in Loucks (2005). The authors skipped some details of the book, so I worked them out. Below is a detailed proof.

\begin{aligned} \mathbb{V}(\overline{X}) &= \mathbb{E}\left((\overline{X}-\mu)^2\right) \\ &= \mathbb{E}(\overline{X}^2 - 2\mu\overline{X} + \mu^2)\\ &= \frac{1}{n^2} \mathbb{E} \left( \left(\sum_{i=1}^{n} X_i\right)^2 - 2n\mu\sum_{i=1}^{n}X_i + n^2\mu^2\right)\\ &= \frac{1}{n^2} \mathbb{E} \left( \left(\sum_{i=1}^{n} X_i\right)\left(\sum_{j=1}^{n} X_j\right) - 2\left(\sum_{i=1}^{n}X_i\right)\left(\sum_{j=1}^{n}\mu\right) + \sum_{i=1}^{n}\sum_{j=1}^{n}\mu^2\right)\\ &= \frac{1}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{j=1}^{n}X_iX_j - \mu X_i - \mu X_j + \mu^2 \right)\\ &= \frac{1}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{j=1}^{n}(X_i - \mu)(X_j - \mu)\right)\\ \end{aligned}

Now, on one hand, the summands where i = j can be grouped; on the other hand, note that when j \neq j, there is one summand for j > i and one summand for j < i. Therefore,

\displaystyle \mathbb{V}(\overline{X}) = \frac{1}{n^2}\mathbb{E}\left(n\sum_{i=1}^{n}(X_i - \mu)^2 + 2\sum_{i=1}^{n}\sum_{j=i+1}^{n}(X_i - \mu)(X_j - \mu)\right)           (1)

Let k = j - i, in other words, k denotes the lag between the j^{\text{th}} and i^{\text{th}} timesteps. (1) becomes

\begin{aligned} \mathbb{V}(\overline{X}) &= \frac{1}{n}\mathbb{E}\left(\sum_{i=1}^{n}(X_i - \mu)^2\right) + \frac{2}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{k=1}^{n-1}(X_i - \mu)(X_{i+k} - \mu \right) \\ &= \frac{\mathbb{V}(X)}{n} + \frac{2}{n^2}\sum_{k=1}^{n-1}\sum_{i=1}^{n-k}\text{Cov}(X_i,X_{i+k})\\ &= \frac{\sigma^2}{n} + \frac{2}{n^2}\sum_{k=1}^{n-1}(n-k)\rho(k)\sigma^2\\ &= \frac{\sigma^2}{n}\left(1 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)\rho(k)\right) \end{aligned}

where \rho(k) is the lag-k autocorrelation and is defined as

\displaystyle \rho(k) = \frac{\text{Cov}(X_i, X_{i+k})}{\sigma^2}

Observe that compared to the IID case, the variance of the sample mean estimator is inflated by a factor bigger than 1. Furthermore, it can be checked that this factor does not decrease as n increases. We conclude that the sample mean of an autocorrelated time series always has a bigger standard error than that of an IID time series with the same variance.

References

Loucks, D.P et al. (2005). Water Resources Systems Planning and Management (Chapter 7, p. 198-201). UNESCO Publication. (The book is publicly available on the UNESCO website)