Section 7b: Repeated Measures ANOVA I. Repeated Measures with One Within-Subjects Factor II. One Between-Subjects Factor and One-within Subjects Factor The examples in this section demonstrate how PROC MIXED computes models for correlated data, in particular for repeated measures data, that is, where data from each subject of one specific type (such as weights) collected over time or where each subject is receives multiple treatment conditions in a randomized order and the measurements of the same response are collected. [Note: crossover designs fall under this category and eventually will be examined in a separate section.] The essential point of reference to begin is that in any study where you have collected two or more responses from each experimental unit or subject and you want to compare means, the concepts presented in this section will likely apply. In some situations you may still prefer to model the data with random factors listed on a RANDOM statement - experience and examinging the covariance matrices will help you decide when each approach is appropriate. PROC MIXED allows the interdependence of observations to be modeled directly. For example, if several measurements are made on a subject over time, then fitting a mixed model allows us to specify a pattern for the correlation between these measurements. These first examples demonstrate this aspect of analyzing repeated measurements. The models first described assume one observation is collected at each time point or under each treatment condition. With one exception (described later along with the RANDOM statement), the REPEATED statement works under the assumption of a relatively small number of repeated measurements. If multiple measurements (i.e., several to many) are collected at each time point or treatment (as demonstrated in Section 7a), the RANDOM statement may be a better choice to evaluate random effects which will be demonstrated in a subsequent examples. Please note that the REPEATED statement assumes unique combinations of the variable(s) listed on the SUBJECT= option and the repeated measures. If this is not true, you will find this message in the log: Iteration History Iteration Evaluations -2 Res Log Like 0 1 169211.5928770 WARNING: Stopped because of infinite likelihood. I. Repeated Measures with One Within-Subjects Factor In this example there will be no between-subjects factor; only one factor with two or more levels collected from each subject which affects the response variable in a linear relationship, such as a linear growth model with data collected over time. Assume you have collected three or more measurements from each of several subjects over time which may or may not be over equal time intervals. As a simple example, the following statements test for a linear change in the response variable y over time averaged over subjects: PROC MIXED DATA=indat; CLASS id ; MODEL y = time / solution ; REPEATED time / SUBJECT=id TYPE=cs rcorr; RUN; SUBJECT=id indicates the clustering of the data, that is, it stays constant for each subject allowing the values of time to vary (each of them unique). The variable "id" must appear on the CLASS statement. The TYPE=cs structure of the within-subjects covariance matrix assumes equal correlations among all possible pairs of time, analogous to the compound symmetry covariance structure present with a PROC GLM repeated measures approach. A covariance structure that perhaps allows for a more realistic modeling of decreasing correlations over time is TYPE=ar(1), the auto-regressive matrix. This structure implies that data collected at time points close to each other will be more highly correlated than data collected at time lags which are further apart. Essentially, corr = rho^lag, where lag is the number of intervals that separates points from each other (e.g., lag=1 indicates adjacent values, lag=2 indicates times 1 and 3, or 2 and 4, etc.). The rcorr option prints the estimated correlation matrix to the output. Another common type of repeated measures problem is collecting observations over time (e.g., pre-test, post-test, followup) where the times of collection for each subject are not equally spaced. That is, the post-test may occur a short time after the pre-test and the followup after a longer time interval. Also, in this example a researcher would not expect a linear time pattern to hold; a step function controlled by a shift in intercepts at each time point may be more realistic. In this situation time is treated as a classification factor where the highest coded value serves as the reference category: PROC MIXED; CLASS time id ; MODEL y = time ; REPEATED time / SUBJECT=id TYPE=sp(pow)(time1) rcorr; RUN; In this particular dataset time is represented in two ways. First the variable "time" is coded 1, 2, or 3 to represent the 3 levels of time. Second, time1 is a numeric variable that is the actual time interval (hours, days, weeks, etc.). When are REPEATED Measures and RANDOM Effects equivalent? For future reference, under the assumption of compound symmetry (type=cs with the REPEATED statement given above) for one within subjects factor, the following three sets of PROC MIXED commands compute the same repeated measures model with a REPEATED statement or two variations of the RANDOM statement: PROC MIXED; PROC MIXED; PROC MIXED; CLASS time id ; CLASS time id ; CLASS time id ; MODEL y = time ; MODEL y = time ; MODEL y = time ; REPEATED / SUBJECT=id type=cs; RANDOM id; RANDOM int / subject=id; RUN; RUN; RUN; These illustrations of how the RANDOM statement works fits the same model as the set of PROC MIXED commands that entered the REPEATED statement with type=CS for compound symmetry. One potential difference is the RANDOM statement specifications constrain the correlation to be positive (which is usually the case) whereas the REPEATED specification leaves the correlation unconstrained (that is, in some situations it could be negative). Understanding how the RANDOM and REPEATED statements work individually and jointly is crucial for successful applications of repeated measures or random effects ANOVA models with PROC MIXED. II. One Between-Subjects Factor and One-within Subjects Factor The required statements for a repeated measures ANOVA with one between and one within factor comparable to the PROC GLM statements presented in Section 4 (Why choose PROC MIXED?) include: PROC MIXED; CLASS group time id; MODEL y = group | time / solution ddfm=bw; REPEATED time / SUBJECT=id TYPE=cs rcorr; RUN; Note that the REPEATED statement in this example functions differently than the REPEATED statement for PROC GLM. Essential information about how to treat the within-subject factors is provided. With PROC MIXED you do not need to specify the "between" and "within" factors; the within factors are automatically accounted for by the subject identification variable that appears with SUBJECT=id. The definition of a repeated measures design implies that observations within a subject (defined by the unique value of id) are assumed to have a given correlation structure whereas observations across subjects are assumed to be uncorrelated. The REPEATED statement indicates the basic elements of this structure. First, the SUBJECT= option requires the dataset to contain a variable that uniquely identifies each subject, in this case it is called id (which should be placed on the CLASS statement). SUBJECT=id indicates observations with the same value of id are correlated; they are independent otherwise. The variable time on the REPEATED statement indicates the order of the repeated values within subject. It is not necessary to include it if the values of time are sorted in ascending order within subject. The Rcorr option prints the within-subject correlation matrix to the output. The corresponding commands in SPSS are: MIXED y BY group WITH time /FIXED = group time tim*group | SSTYPE(3) /METHOD = REML /PRINT = R Rcorr SOLUTION /REPEATED = time | SUBJECT(id) COVTYPE(cs) .