In a linear regression model , it is often assumed that the noise term has constant variance (*homoscedasticity* property). Furthermore, it is often advantageous when the dependent variable has a normal distribution (*normality* property). However, these “nice” properties may not always hold. In such case, the Box-Cox transformation (Box and Cox, 1964) can be of good use. It is a way to transform the dependent variable in the regression model such that these properties (homoscedasticity and normality) are *“approximately satisfied”* (Box and Cox, 1964). In the univariate case, the transformation is formulated as

and in the multivariate case (with ),

where is the geometric mean of . The parameter is one that maximizes the likelihood of given , and thus can be estimated using the maximum likelihood method. Since we are assuming that is normal, and that the regression model is linear, can also be estimated by minimizing the residual sum of squares. The latter method is easier.

Here’s a caveat. The procedure above means that is estimated **assuming** that homoscedasticity and normality hold for a linear model. It doesn’t mean that the transformation makes the assumptions valid. Therefore, you need to check them. Sometimes the transformation just doesn’t work, in which case you need to find another (more sophisticated) model.

I came across this transformation a while ago. Back then, I was looking for some kind of transformation that converts a random variable into one that has a normal distribution. I actually did not know that the Box-Cox transformation works with a linear model, so I just converted directly. In the R package , the command to do this is

z <- bcPower(y, powerTransform(y)$lambda)

I gotta admit I didn’t fully understand the function’s documentation in R. Once I read the original paper, it all became clear to me what the documentation meant. For the function , when we provide the input as a vector instead of an object, the function treats it as if we were providing the linear model .

This is just one example of my knowledge journey. It’s not straight, but convoluted. And that’s what makes it worth going.

I encourage you to read the original paper (public domain). Besides the content, it is also interesting to read scientific writing in the 1960s. You can also read other references listed below for more information.

### References

Box, G. and Cox, R. (1964). An Analysis of Transformations. *The Royal Statisical Society Series B (Methodological)*, 26(2):211-252.

Buthman, A. Making Data Normal Using Box-Cox Power Transformation. Retrieved 31 Aug 2016.

Wikipedia. Power transform. Retrieved 31 Aug 2016.

An interesting discussion on StackExchange. Retrieved 31 Aug 2016.