In statistics, the Breusch–Godfrey test, named after Trevor S. Breusch and Leslie G. Godfrey, is used to assess the validity of some of the modelling assumptions inherent in applying regression-like models to observed data series. In particular, it tests for the presence of serial correlation that has not been included in a proposed model structure and which, if present, would mean that incorrect conclusions would be drawn from other tests, or that sub-optimal estimates of model parameters are obtained if it is not taken into account. The regression models to which the test can be applied include cases where lagged values of the dependent variables are used as independent variables in the model's representation for later observations. This type of structure is common in econometric models.
The Breusch–Godfrey serial correlation LM test is a test for autocorrelation in the errors in a regression model. It makes use of the residuals from the model being considered in a regression analysis, and a test statistic is derived from these. The null hypothesis is that there is no serial correlation of any order up to p.
The test is more general than the Durbin–Watson statistic (or Durbin's h statistic), which is only valid for nonstochastic regressors and for testing the possibility of a first-order autoregressive model (e.g. AR(1)) for the regression errors. The BG test has none of these restrictions, and is statistically more powerful than Durbin's h statistic.
Consider a linear regression of any form, for example
where the errors might follow an AR(p) autoregressive scheme, as follows:
The simple regression model is first fitted by ordinary least squares to obtain a set of sample residuals .
Breusch and Godfrey proved that, if the following auxiliary regression model is fitted
and if the usual statistic is calculated for this model, then the following asymptotic approximation can be used for the distribution of the test statistic
when the null hypothesis holds (that is, there is no serial correlation of any order up to p). Here n is the number of data-points available for the second regression, that for ,
where T is the number of observations in the basic series. Note that the value of n depends on the number of lags of the error term (p).