A mixed model (or more precisely mixed error-component model) is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences.
They are particularly useful in settings where repeated measurements are made on the same statistical units (longitudinal study), or where measurements are made on clusters of related statistical units. Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures ANOVA.
Ronald Fisher introduced random effects models to study the correlations of trait values between relatives. In the 1950s, Charles Roy Henderson
provided best linear unbiased estimates (BLUE) of fixed effects and best linear unbiased predictions (BLUP) of random effects. Subsequently, mixed modeling has become a major area of statistical research, including work on computation of maximum likelihood estimates, non-linear mixed effects models, missing data in mixed effects models, and Bayesian estimation of mixed effects models. Mixed models are applied in many disciplines where multiple correlated measurements are made on each unit of interest. They are prominently used in research involving human and animal subjects in fields ranging from genetics to marketing, and have also been used in baseball and industrial statistics.
is an unknown vector of random errors, with mean and variance ;
and are known design matrices relating the observations to and , respectively.
The joint density of and can be written as: .
Assuming normality, , and , and maximizing the joint density over and , gives Henderson's "mixed model equations" (MME):
The solutions to the MME, and are best linear unbiased estimates (BLUE) and predictors (BLUP) for and , respectively. This is a consequence of the Gauss–Markov theorem when the conditional variance of the outcome is not scalable to the identity matrix. When the conditional variance is known, then the inverse variance weighted least squares estimate is BLUE. However, the conditional variance is rarely, if ever, known. So it is desirable to jointly estimate the variance and weighted parameter estimates when solving MMEs.
One method used to fit such mixed models is that of the EM algorithm where the variance components are treated as unobserved nuisance parameters in the joint likelihood. Currently, this is the implemented method for the major statistical software packages R (lme in the nlme package, or lmer in the lme4 package), Python (statsmodels package), Julia (MixedModels.jl package), and SAS (proc mixed). The solution to the mixed model equations is a maximum likelihood estimate when the distribution of the errors is normal.
^C. R. Henderson; Oscar Kempthorne; S. R. Searle; C. M. von Krosigk (1959). "The Estimation of Environmental and Genetic Trends from Records Subject to Culling". Biometrics. International Biometric Society. 15 (2): 192–218. doi:10.2307/2527669. JSTOR2527669.
^McLean, Robert A.; Sanders, William L.; Stroup, Walter W. (1991). "A Unified Approach to Mixed Linear Models". The American Statistician. American Statistical Association. 45 (1): 54–64. doi:10.2307/2685241. JSTOR2685241.