Understanding Mixed Model Repeated Measures (MMRM) in SAS and R

Mixed models for repeated measures (MMRM) is widely used for analyzing longitdinal continuous outcomes in randomized clinical trials. Repeated measures refer to multiple measures taken from the same experimental unit, such as a couple of tests over time on the same subject. And the advantage of this model is that it can avoid model misspcification and provide unbiased estimation for data that is missing completely at random (MCAR) or missing at random (MAR).

As we know, the common primary outcome in randomized trials is often the difference in average (LS mean) at a given timepoint (visit). One way to analyze these data is to ignore the measurements at intermediate timepoints and focus on estimating the outcome at the specific timepoint by ANCOVA, but the data should be complete. If not, sometimes the multiple imputation method is suggested. However in the MMRM model, it's generally thought that utilizing the information from all timepoints implicitly handles missing data. In SAS, it's more efficient to use proc mixed than proc glm to handle missing values, which allows the inclusion of subjects with missing data. And in R, I feel like the 'mmrm' package is more powerful and runs more smoothly than others.

Example data

Here, I take the example data from mmrm package and implement the MMRM using SAS and R, respectively. In this randomized trial, subjects are treated with a treatment drug or placebo, and the FEV1 (forced expired volume in one second) is a measure of how quickly the lungs can be emptied. This measure is repeated from Visit 1 to Visit 4. Low levels of FEV1 may indicate chronic obstructive pulmonary disease (COPD). To evaluate the effect of treatment on FEV1, the MMRM will be used to analyze the outcome with an unstructured covariance matrix reflecting the correlation between visits within the subjects, treatment (treatment drug or placebo), visit and treatment-by-visit as the fixed effects, subject as a randon effect, visit as a repeated measure, and baseline as the covariates.

Implementation

library(mmrm)
data("fev_data")
write.csv(fev_data, file = "./fev_data.csv", na = "", row.names = F)

To evaluate the effect of treatment on FEV1, this endpoint measurements can be analyzed using MMRM with an unstructured covariance matrix reflecting the correlation between visits within the subjects, treatment (treatment drug or placebo), visit and treatment-by-visit as the fixed effects, subject as a randon effect, visit as a repeated measure, and race as the covariate.

So the SAS code as shown below.

proc import datafile="./fev_data.csv" 
    out=fev_data
    dbms=csv replace;
    getnames=yes;
run;

proc mixed data=fev_data method=reml;
    class ARMCD(ref='PBO') AVISIT RACE USUBJID;
    model FEV1 = RACE ARMCD AVISIT ARMCD*AVISIT / ddfm=KR;
    repeated AVISIT / subject=USUBJID type=UN r rcorr;
    lsmeans ARMCD*AVISIT / cl alpha=0.05 diff slice=AVISIT;
    lsmeans ARMCD / cl alpha=0.05 diff;
    ods output lsmeans=lsm diffs=diff;
run;

From above SAS code, we can see that the method option specifies the estimation method as REML. The repeated statement is used to specify the repeated measures factor and control the covariance structure. In the repeated measures models, the subject optional is used to define which observations belong to the same subject, and which belong to the different subjects who are assumed to be independent. The type optional statement specifies the model for the covariance structure of the error within subjects. We also add ddfm=KR in model statement to specify a method for the denominator degrees of freedom (such as Kenward-Rogers here). At least, the LS mean calculated from the lsmeans statement with ci and diff options is also very commonly used. These two options can help us obtain the confidence interval and difference of the LS mean, and the p value if the hypothesis margin is 0.

As for ARMCD*AVISIT in the lsmeans statement that means you would like to get the test of LS means in all combinations of visits. If you try the lsmeans ARMCD, which is identical to the mean of pair-wise visits from the LS means of lsmeans ARMCD*AVISIT.

And the same arguments in R, as shown below.

library(mmrm)
library(emmeans)

data("fev_data")

fit <- mmrm(
  formula = FEV1 ~ RACE + ARMCD + AVISIT + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)
# summary(fit)

If you would like to obtain the LS mean of each visit for each group, like the lsm dataset in SAS, you can use the emmeans function from the emmeans package as the mmrm object can be analyzed by the external package.

# emmeans(fit, "ARMCD", by = "AVISIT")
emmeans(fit, ~ ARMCD | AVISIT)

## AVISIT = VIS1:
##  ARMCD emmean    SE  df lower.CL upper.CL
##  PBO     33.3 0.757 149     31.8     34.8
##  TRT     37.1 0.764 144     35.6     38.6
## 
## AVISIT = VIS2:
##  ARMCD emmean    SE  df lower.CL upper.CL
##  PBO     38.2 0.608 150     37.0     39.4
##  TRT     41.9 0.598 146     40.7     43.1
## 
## AVISIT = VIS3:
##  ARMCD emmean    SE  df lower.CL upper.CL
##  PBO     43.7 0.462 131     42.8     44.6
##  TRT     46.8 0.507 130     45.8     47.8
## 
## AVISIT = VIS4:
##  ARMCD emmean    SE  df lower.CL upper.CL
##  PBO     48.4 1.189 134     46.0     50.7
##  TRT     52.8 1.188 133     50.4     55.1
## 
## Results are averaged over the levels of: RACE 
## Confidence level used: 0.95

As for the diff dataset from SAS, you can use the pairs function to get identical outputs.

pairs(emmeans(fit, ~ ARMCD | AVISIT), reverse = TRUE, adjust="tukey")

## AVISIT = VIS1:
##  contrast  estimate    SE  df t.ratio p.value
##  TRT - PBO     3.78 1.076 146   3.508  0.0006
## 
## AVISIT = VIS2:
##  contrast  estimate    SE  df t.ratio p.value
##  TRT - PBO     3.76 0.853 148   4.405  <.0001
## 
## AVISIT = VIS3:
##  contrast  estimate    SE  df t.ratio p.value
##  TRT - PBO     3.11 0.689 132   4.509  <.0001
## 
## AVISIT = VIS4:
##  contrast  estimate    SE  df t.ratio p.value
##  TRT - PBO     4.41 1.681 133   2.622  0.0098
## 
## Results are averaged over the levels of: RACE

Questions

Why we must include the interaction effect in the model?

I feel like if we use the ANCOVA model and focus on the specific timepoint before the end of the trial, in that case, we can say the treatment effect is the main difference between the treatment and control groups. But in MMRM, we include all timepoints's information. Despite the collection of these intermediate outcomes, the primary outcome is often still the difference at that specific or final timepoint. Thus, it will have a couple of advantages, like improving the power and avoiding the bias of dropout because although the subjects withdraw from the study before the final timepoint, they may still contribute information in the interim. Once all the timepoints are included, the treatment-by-visit also should be added to the model as a consideration when the effect is different in the slopes of outcomes over time.

How to select the covariance structure?

Initially, the unstructured (type=UN) covariance structure allows SAS to estimate the covariance matrix, as the unstructured approach makes no assumption at all about the relationship in the correlations among study visits. As for how to select an appropriate covariance structure, it depends on your understanding of the study and the data you have. Here are also a couple of documents for your reference if you would like to know which structure can be used and how to try and select a more suitable structure. For instance, the lower AIC values suggest a better fit.

Here are two documents for your reference: - Selecting an Appropriate Covariance Structure - Guidelines for Selecting the Covariance Structure in Mixed Model Analysis

KeepNotes blog