The Box-Cox transformation

In a linear regression model y = r(x) + \varepsilon, it is often assumed that the noise term \varepsilon has constant variance (homoscedasticity property). Furthermore, it is often advantageous when the dependent variable y has a normal distribution (normality property). However, these “nice” properties may not always hold. In such case, the Box-Cox transformation (Box and Cox, 1964) can be of good use. It is a way to transform the dependent variable in the regression model such that these properties (homoscedasticity and normality) are “approximately satisfied” (Box and Cox, 1964). In the univariate case, the transformation is formulated as

z = \begin{cases} \dfrac{y^\lambda-1}{\lambda} \qquad (\lambda \neq 0),\\ \log y \qquad (\lambda = 0), \end{cases}

and in the multivariate case (with \mathbf{y} = (y_1, ..., y_n)^\intercal),

z_i = \begin{cases} \dfrac{y_i^\lambda-1}{\lambda \overline{y}^{\lambda-1}} \qquad (\lambda \neq 0),\\ \overline{y}\log y_i \qquad (\lambda = 0), \end{cases}

where \overline{y} = (y_1y_2...y_n)^{1/n} is the geometric mean of \mathbf{y}. The parameter \lambda is one that maximizes the likelihood of z given x, and thus can be estimated using the maximum likelihood method. Since we are assuming that z is normal, and that the regression model is linear, \lambda can also be estimated by minimizing the residual sum of squares. The latter method is easier.

Here’s a caveat. The procedure above means that \lambda is estimated assuming that homoscedasticity and normality hold for a linear model. It doesn’t mean that the transformation makes the assumptions valid. Therefore, you need to check them. Sometimes the transformation just doesn’t work, in which case you need to find another (more sophisticated) model.

I came across this transformation a while ago. Back then, I was looking for some kind of transformation that converts a random variable y into one that has a normal distribution. I actually did not know that the Box-Cox transformation works with a linear model, so I just converted y directly. In the R package \verb|car|, the command to do this is

z <- bcPower(y, powerTransform(y)$lambda)

I gotta admit I didn’t fully understand the function’s documentation in R. Once I read the original paper, it all became clear to me what the documentation meant. For the function \verb|powerTransform|, when we provide the input as a vector \mathbf{y} instead of an \verb|lm| object, the function treats it as if we were providing the linear model \mathbf{y} \sim \mathbf{1}.

This is just one example of my knowledge journey. It’s not straight, but convoluted. And that’s what makes it worth going.

I encourage you to read the original paper (public domain).  Besides the content, it is also interesting to read scientific writing in the 1960s. You can also read other references listed below for more information.


Box, G. and Cox, R. (1964). An Analysis of Transformations. The Royal Statisical Society Series B (Methodological), 26(2):211-252.

Buthman, A. Making Data Normal Using Box-Cox Power Transformation. Retrieved 31 Aug 2016.

Wikipedia. Power transform. Retrieved 31 Aug 2016.

An interesting discussion on StackExchange. Retrieved 31 Aug 2016.