Deterministic versus stochastic models

I had an interesting talk with Sean today over lunch. We discussed whether rainfall is a stochastic or deterministic process. At one point, I asked how we should consider a flip of a coin then. Sean brought up the chaoticity theory which says that instead of using a stochastic model, you can have a deterministic one and only change the initial conditions in order to have different outputs. There are some papers on that on Journal of Hydrology.

This is definitely something to check out, though I certainly have to focus on what I am doing.


Variance of sample mean in an autocorrelated stochastic process

Let X be a stochastic process with mean \mathbb{E}(X) = \mu, and variance \mathbb{V}(X) = \sigma^2 . Let X_1,..., X_n be an observed time series of X.

A good estimator for \mu is \overline{X}  = \sum_{i = 1}^{n} X_i . We know that if the observations are IID, \mathbb{V}(\overline{X}) = \frac{\sigma^2}{n}. However, if the observations are not IID, the variance will be larger – this is what I learned recently in Loucks (2005). The authors skipped some details of the book, so I worked them out. Below is a detailed proof.

\begin{aligned} \mathbb{V}(\overline{X}) &= \mathbb{E}\left((\overline{X}-\mu)^2\right) \\ &= \mathbb{E}(\overline{X}^2 - 2\mu\overline{X} + \mu^2)\\ &= \frac{1}{n^2} \mathbb{E} \left( \left(\sum_{i=1}^{n} X_i\right)^2 - 2n\mu\sum_{i=1}^{n}X_i + n^2\mu^2\right)\\ &= \frac{1}{n^2} \mathbb{E} \left( \left(\sum_{i=1}^{n} X_i\right)\left(\sum_{j=1}^{n} X_j\right) - 2\left(\sum_{i=1}^{n}X_i\right)\left(\sum_{j=1}^{n}\mu\right) + \sum_{i=1}^{n}\sum_{j=1}^{n}\mu^2\right)\\ &= \frac{1}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{j=1}^{n}X_iX_j - \mu X_i - \mu X_j + \mu^2 \right)\\ &= \frac{1}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{j=1}^{n}(X_i - \mu)(X_j - \mu)\right)\\ \end{aligned}

Now, on one hand, the summands where i = j can be grouped; on the other hand, note that when j \neq j, there is one summand for j > i and one summand for j < i. Therefore,

\displaystyle \mathbb{V}(\overline{X}) = \frac{1}{n^2}\mathbb{E}\left(n\sum_{i=1}^{n}(X_i - \mu)^2 + 2\sum_{i=1}^{n}\sum_{j=i+1}^{n}(X_i - \mu)(X_j - \mu)\right)           (1)

Let k = j - i, in other words, k denotes the lag between the j^{\text{th}} and i^{\text{th}} timesteps. (1) becomes

\begin{aligned} \mathbb{V}(\overline{X}) &= \frac{1}{n}\mathbb{E}\left(\sum_{i=1}^{n}(X_i - \mu)^2\right) + \frac{2}{n^2}\mathbb{E}\left(\sum_{i=1}^{n}\sum_{k=1}^{n-1}(X_i - \mu)(X_{i+k} - \mu \right) \\ &= \frac{\mathbb{V}(X)}{n} + \frac{2}{n^2}\sum_{k=1}^{n-1}\sum_{i=1}^{n-k}\text{Cov}(X_i,X_{i+k})\\ &= \frac{\sigma^2}{n} + \frac{2}{n^2}\sum_{k=1}^{n-1}(n-k)\rho(k)\sigma^2\\ &= \frac{\sigma^2}{n}\left(1 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)\rho(k)\right) \end{aligned}

where \rho(k) is the lag-k autocorrelation and is defined as

\displaystyle \rho(k) = \frac{\text{Cov}(X_i, X_{i+k})}{\sigma^2}

Observe that compared to the IID case, the variance of the sample mean estimator is inflated by a factor bigger than 1. Furthermore, it can be checked that this factor does not decrease as n increases. We conclude that the sample mean of an autocorrelated time series always has a bigger standard error than that of an IID time series with the same variance.


Loucks, D.P et al. (2005). Water Resources Systems Planning and Management (Chapter 7, p. 198-201). UNESCO Publication. (The book is publicly available on the UNESCO website)