# Variance of sample mean in an autocorrelated stochastic process

Let $X$ be a stochastic process with mean $\mathbb{E}(X) = \mu$, and variance $\mathbb{V}(X) = \sigma^2$. Let $X_1,..., X_n$ be an observed time series of $X$.

A good estimator for $\mu$ is $\overline{X} = \sum_{i = 1}^{n} X_i$ . We know that if the observations are IID, $\mathbb{V}(\overline{X}) = \frac{\sigma^2}{n}$. However, if the observations are not IID, the variance will be larger – this is what I learned recently in Loucks (2005). The authors skipped some details of the book, so I worked them out. Below is a detailed proof.

\begin{aligned} \mathbb{V}(\overline{X}) &= \mathbb{E}\left((\overline{X}-\mu)^2\right) \\ &= \mathbb{E}(\overline{X}^2 - 2\mu\overline{X} + \mu^2)\\ &= \frac{1}{n^2} \mathbb{E} \left( \left(\sum_{i=1}^{n} X_i\right)^2 - 2n\mu\sum_{i=1}^{n}X_i + n^2\mu^2\right)\\ &= \frac{1}{n^2} \mathbb{E} \left( \left(\sum_{i=1}^{n} X_i\right)\left(\sum_{j=1}^{n} X_j\right) - 2\left(\sum_{i=1}^{n}X_i\right)\left(\sum_{j=1}^{n}\mu\right) + \sum_{i=1}^{n}\sum_{j=1}^{n}\mu^2\right)\\ &= \frac{1}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{j=1}^{n}X_iX_j - \mu X_i - \mu X_j + \mu^2 \right)\\ &= \frac{1}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{j=1}^{n}(X_i - \mu)(X_j - \mu)\right)\\ \end{aligned}

Now, on one hand, the summands where $i = j$ can be grouped; on the other hand, note that when $j \neq j$, there is one summand for $j > i$ and one summand for $j < i$. Therefore,

$\displaystyle \mathbb{V}(\overline{X}) = \frac{1}{n^2}\mathbb{E}\left(n\sum_{i=1}^{n}(X_i - \mu)^2 + 2\sum_{i=1}^{n}\sum_{j=i+1}^{n}(X_i - \mu)(X_j - \mu)\right)$           (1)

Let $k = j - i$, in other words, $k$ denotes the lag between the $j^{\text{th}}$ and $i^{\text{th}}$ timesteps. (1) becomes

\begin{aligned} \mathbb{V}(\overline{X}) &= \frac{1}{n}\mathbb{E}\left(\sum_{i=1}^{n}(X_i - \mu)^2\right) + \frac{2}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{k=1}^{n-1}(X_i - \mu)(X_{i+k} - \mu \right) \\ &= \frac{\mathbb{V}(X)}{n} + \frac{2}{n^2}\sum_{k=1}^{n-1}\sum_{i=1}^{n-k}\text{Cov}(X_i,X_{i+k})\\ &= \frac{\sigma^2}{n} + \frac{2}{n^2}\sum_{k=1}^{n-1}(n-k)\rho(k)\sigma^2\\ &= \frac{\sigma^2}{n}\left(1 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)\rho(k)\right) \end{aligned}

where $\rho(k)$ is the lag-$k$ autocorrelation and is defined as

$\displaystyle \rho(k) = \frac{\text{Cov}(X_i, X_{i+k})}{\sigma^2}$

Observe that compared to the IID case, the variance of the sample mean estimator is inflated by a factor bigger than 1. Furthermore, it can be checked that this factor does not decrease as $n$ increases. We conclude that the sample mean of an autocorrelated time series always has a bigger standard error than that of an IID time series with the same variance.

References

Loucks, D.P et al. (2005). Water Resources Systems Planning and Management (Chapter 7, p. 198-201). UNESCO Publication. (The book is publicly available on the UNESCO website)