Section 2: An Introduction to the features of the MIXED procedure Most of pages found in this complete document will focus on the capabilities of SAS to analyze data with mixed effects linear models with occasional comments on the ability of SPSS to work with them as well. Other programs, such as Stata and S-Plus, also compute these models quite well, but will not be mentioned here. The procedure MIXED was introduced by SAS in the early 1990s; "Mixed Models" were added to Version 12 of SPSS in the Advanced Regression module. Whereas the primary task of modeling data with the general linear model (e.g., PROC GLM with SAS) is to fit a linear ANOVA model with fixed effects, the mixed model can do this, plus broaden the choice of possibilities whenever you have two or more random effects (i.e., variance components). The name of the procedure itself, MIXED, is derived from its enhanced ability to work with statistical designs that contain both fixed effects and random effects. PROC MIXED fits linear models with the response variable modeled directly by linear combinations of fixed and random effects with the model of the form: y = X*B + Z*u + e where y is a vector of observations B is a vector of unknown fixed effects X is a known design matrix that relates observations to fixed effects u is a vector of unknown random effects Z is a matrix that relates observations to random effects e is a vector of residuals With this model the expected value of the response y is: E(y) = XB Let G: the variance-covariance matrix for the random effects u R: the variance covariance matrix of the random residuals e so that u ~ N(0,G) and e ~ N(0,R) The matrix G can be conceptualized as a between-subject variance/covariance matrix; R is a within-subject variance/covariance matrix. Since u and e are also assumed to be independent, the variance/covariance matrix of the vector of observations y from each subject has the form: VAR(y) = V = ZGZ' + R Understanding the V matrix is a very important component of working with mixed models since it contains both sources of random variation and defines how these models differ from computations with Ordinary Least Squares (OLS). If you only have random effects models (such as a randomized block design) the G matrix is the primary focus. For within-subject effects, such as repeated measures with data collected over several within-subject treatment conditions, the R matrix is relevant. Both matrices are important components of the discussion on these pages and in many situations only one of them is needed; however, some data analysis situations require both matrices. If the G and R matrices are known, generalized least squares can estimate any linear combination of the fixed effects B. However, as usually is the case these matrices are not known, so a complex iterative algorithm for fitting linear models is invoked. Benefits of Mixed Linear Models One important example of the many benefits available to modelling data with this approach is that repeated measures data can be analyzed with a wider variety of correlation structures than available with PROC GLM. It also computes several tests of significance correctly that PROC GLM needed to be instructed how to compute with TEST statements, such as those required for split-plot designs. There are a few data analysis situations where MIXED linear models and GLMs will provide identical results (e.g., fixed effects ANOVA with only one random component, the residual variance) and these designs are good starting points to demonstrate their similarities. However, it is the OTHER situations where you should consider the many features of PROC MIXED to add flexibility to data analysis. A mixed model provides you with more appropriate ways to work with clustered data, i.e., data that do not meet the three important assumptions for fixed effects ANOVA. Since they can handle multiple sources of variation better than GLM, mixed models have more flexibility to make broader and more appropriate inferences about data. If you have a randomized block design or any design that includes two or more random effects -- including repeated measures designs -- then you should consider learning about the versatility of MIXED models. MIXED linear models are an extremely powerful tool for many types of statistical analyses involving linear models for continuous data because it has features built into it not previously available is other procedures. PROCs TTEST, ANOVA, and GLM of SAS (and the analogous procedures from other programs) were written in the early years of statistical computing when software and hardware technology was limited to the testing hypotheses of linear models with fixed effects. Computations with random effects are more feasible today with our high speed computers (even desktop and laptop models) and sophisticated software. In addition to the same ability to handle basic fixed effects analysis of variance problems, mixed modeling techniques is now the preferred approach for virtually all clustered or repeated measures data. With clustered data (such as multiple measurements collected from each subject), GLM and MIXED will not necessarily provide the same results. The default procedure with MIXED estimates variance components through Restricted-Maximum Likelihood (REML) while PROC GLM employs moment estimates. For this reason, PROC MIXED is much more computationally intensive than GLM, which implies certain models may take a longer time to fit. It takes a different approach to statistical analysis to obtain estimates and confidence intervals for test statistics. Despite the limitations PROC GLM reveals when compared to PROC MIXED, it should continue to find basic applications of linear models. For example, PROC GLM still is the choice to compute multivariate tests, such as Wilk's lambda, evaluate the estimability of linear combinations of factors (the E options), and for its wide range of post-hoc multiple comparison tests for independent groups ANOVA. However, you will quickly discover in the following pages (and many resources already based on PROC MIXED) that whatever PROC GLM can do, the MIXED model approach will mostly likely work better and offer more choices for fitting mixed linear models. Since output from the general linear model for SPSS closely resembles PROC GLM of SAS, comparisons in this article will primarily be between the GLM and MIXED procedures of SAS. However, many of these same examples could be applied to the GLM and MIXED procedures from SPSS and other programs. In closing this section, you need to consider applying PROC MIXED whenever you have: * random effects * any type of repeated measures * covariance matrices which are not simple diagonal matrices. * unbalanced designs * split-plot designs (or any other sort of design where the experimental units for some factors are different from the experimental units for other factors) * standard errors which are complex functions of model errors * random coefficient models And even if you don't know whether any of these situations describe your data analysis problem then you still ought to run PROC MIXED. PROC MIXED is also a helpful study planning tool for complex designs (i.e., power analysis for a given effect size and a fixed number of subjects) as will be demonstrated in a subsequent section. Where to Learn More? Books and papers abound on running mixed linear models. Perhaps the best of them all is "The SAS System for Mixed Models", 2nd Ed. by Little, et. al. (2006) which provides many examples that an analyst is likely to encounter in real life. Commands that fit appropriate models are presented and explained in detail. Everything is well organized, clear, and concise. For the mix of theory, examples, and code, you will not find a better treatment anywhere. It costs around $90 for 800+ pages of material and is well worth the price for a volume that provides much of the information that you would ever need to know about mixed linear models, GLIMMIX, and NLMIXED in one easily accessible source. SUGI papers are a great source for some information; however, they do not provide a coherent, unified treatment on any specific subject, mixed models included. The same goes for SAS-L archives. One can easily search for and find a great deal of information on many topics from the SAS-L archives, but none of them are organized in a way that will contribute to an effective and comprehensive learning experience.