Mixed models for repeated measures (MMRM) is widely used for analyzing longitdinal continuous outcomes in randomized clinical trials. Repeated measures refer to multiple measures taken from the same experimental unit, such as a couple of tests over time on the same subject. And the advantage of this model is that it can avoid model misspcification and provide unbiased estimation for data that is missing completely at random (MCAR) or missing at random (MAR).
As we know, the common primary outcome in randomized trials is often the difference in average (LS mean) at a given timepoint (visit). One way to analyze these data is to ignore the measurements at intermediate timepoints and focus on estimating the outcome at the specific timepoint by ANCOVA, but the data should be complete. If not, sometimes the multiple imputation method is suggested. However in the MMRM model, it's generally thought that utilizing the information from all timepoints implicitly handles missing data. In SAS, it's more efficient to use proc mixed
than proc glm
to handle missing values, which allows the inclusion of subjects with missing data. And in R, I feel like the 'mmrm' package is more powerful and runs more smoothly than others.
Example data
Here, I take the example data from mmrm
package and implement the MMRM using SAS and R, respectively. In this randomized trial, subjects are treated with a treatment drug or placebo, and the FEV1 (forced expired volume in one second) is a measure of how quickly the lungs can be emptied. This measure is repeated from Visit 1 to Visit 4. Low levels of FEV1 may indicate chronic obstructive pulmonary disease (COPD). To evaluate the effect of treatment on FEV1, the MMRM will be used to analyze the outcome with an unstructured covariance matrix reflecting the correlation between visits within the subjects, treatment (treatment drug or placebo), visit and treatment-by-visit as the fixed effects, subject as a randon effect, visit as a repeated measure, and baseline as the covariates.
Implementation
Here, I take the example data from mmrm
package and implement the MMRM using SAS and R, respectively. In this randomized trial, subjects are treated with a treatment drug or placebo, and the FEV1 (forced expired volume in one second) is a measure of how quickly the lungs can be emptied. This measure is repeated from Visit 1 to Visit 4. Low levels of FEV1 may indicate chronic obstructive pulmonary disease (COPD).
library(mmrm)
data("fev_data")
write.csv(fev_data, file = "./fev_data.csv", na = "", row.names = F)
To evaluate the effect of treatment on FEV1, this endpoint measurements can be analyzed using MMRM with an unstructured covariance matrix reflecting the correlation between visits within the subjects, treatment (treatment drug or placebo), visit and treatment-by-visit as the fixed effects, subject as a randon effect, visit as a repeated measure, and race as the covariate.
So the SAS code as shown below.
proc import datafile="./fev_data.csv"
out=fev_data
dbms=csv replace;
getnames=yes;
run;
proc mixed data=fev_data method=reml;
class ARMCD(ref='PBO') AVISIT RACE USUBJID;
model FEV1 = RACE ARMCD AVISIT ARMCD*AVISIT / ddfm=KR;
repeated AVISIT / subject=USUBJID type=UN r rcorr;
lsmeans ARMCD*AVISIT / cl alpha=0.05 diff slice=AVISIT;
lsmeans ARMCD / cl alpha=0.05 diff;
ods output lsmeans=lsm diffs=diff;
run;
From above SAS code, we can see that the method
option specifies the estimation method as REML
. The repeated
statement is used to specify the repeated measures factor and control the covariance structure. In the repeated measures models, the subject
optional is used to define which observations belong to the same subject, and which belong to the different subjects who are assumed to be independent. The type
optional statement specifies the model for the covariance structure of the error within subjects. We also add ddfm=KR
in model
statement to specify a method for the denominator degrees of freedom (such as Kenward-Rogers here). At least, the LS mean calculated from the lsmeans
statement with ci
and diff
options is also very commonly used. These two options can help us obtain the confidence interval and difference of the LS mean, and the p value if the hypothesis margin is 0
.
As for ARMCD*AVISIT
in the lsmeans
statement that means you would like to get the test of LS means in all combinations of visits. If you try the lsmeans ARMCD
, which is identical to the mean of pair-wise visits from the LS means of lsmeans ARMCD*AVISIT
.
And the same arguments in R, as shown below.
library(mmrm)
library(emmeans)
data("fev_data")
fit <- mmrm(
formula = FEV1 ~ RACE + ARMCD + AVISIT + ARMCD * AVISIT + us(AVISIT | USUBJID),
data = fev_data
)
# summary(fit)
If you would like to obtain the LS mean of each visit for each group, like the lsm
dataset in SAS, you can use the emmeans
function from the emmeans
package as the mmrm object can be analyzed by the external package.
# emmeans(fit, "ARMCD", by = "AVISIT")
emmeans(fit, ~ ARMCD | AVISIT)
## AVISIT = VIS1:
## ARMCD emmean SE df lower.CL upper.CL
## PBO 33.3 0.757 149 31.8 34.8
## TRT 37.1 0.764 144 35.6 38.6
##
## AVISIT = VIS2:
## ARMCD emmean SE df lower.CL upper.CL
## PBO 38.2 0.608 150 37.0 39.4
## TRT 41.9 0.598 146 40.7 43.1
##
## AVISIT = VIS3:
## ARMCD emmean SE df lower.CL upper.CL
## PBO 43.7 0.462 131 42.8 44.6
## TRT 46.8 0.507 130 45.8 47.8
##
## AVISIT = VIS4:
## ARMCD emmean SE df lower.CL upper.CL
## PBO 48.4 1.189 134 46.0 50.7
## TRT 52.8 1.188 133 50.4 55.1
##
## Results are averaged over the levels of: RACE
## Confidence level used: 0.95
As for the diff
dataset from SAS, you can use the pairs
function to get identical outputs.
pairs(emmeans(fit, ~ ARMCD | AVISIT), reverse = TRUE, adjust="tukey")
## AVISIT = VIS1:
## contrast estimate SE df t.ratio p.value
## TRT - PBO 3.78 1.076 146 3.508 0.0006
##
## AVISIT = VIS2:
## contrast estimate SE df t.ratio p.value
## TRT - PBO 3.76 0.853 148 4.405 <.0001
##
## AVISIT = VIS3:
## contrast estimate SE df t.ratio p.value
## TRT - PBO 3.11 0.689 132 4.509 <.0001
##
## AVISIT = VIS4:
## contrast estimate SE df t.ratio p.value
## TRT - PBO 4.41 1.681 133 2.622 0.0098
##
## Results are averaged over the levels of: RACE
Questions
Why we must include the interaction effect in the model?
I feel like if we use the ANCOVA model and focus on the specific timepoint before the end of the trial, in that case, we can say the treatment effect is the main difference between the treatment and control groups. But in MMRM, we include all timepoints's information. Despite the collection of these intermediate outcomes, the primary outcome is often still the difference at that specific or final timepoint. Thus, it will have a couple of advantages, like improving the power and avoiding the bias of dropout because although the subjects withdraw from the study before the final timepoint, they may still contribute information in the interim. Once all the timepoints are included, the treatment-by-visit also should be added to the model as a consideration when the effect is different in the slopes of outcomes over time.
How to select the covariance structure?
Initially, the unstructured (type=UN
) covariance structure allows SAS to estimate the covariance matrix, as the unstructured approach makes no assumption at all about the relationship in the correlations among study visits. As for how to select an appropriate covariance structure, it depends on your understanding of the study and the data you have. Here are also a couple of documents for your reference if you would like to know which structure can be used and how to try and select a more suitable structure. For instance, the lower AIC values suggest a better fit.
Here are two documents for your reference: - Selecting an Appropriate Covariance Structure - Guidelines for Selecting the Covariance Structure in Mixed Model Analysis
Reference
- MMRM Package Introduction
- MIXED MODEL REPEATED MEASURES (MMRM)
- Proc mixed
- Understanding Interaction Effects in Statistics
- Mixed Model Repeated Measures (MMRM)
- Repeated Measures Modeling With PROC MIXED
- Mixed Models for Repeated Measures Should Include Time-by-Covariate Interactions to Assure Power Gains and Robustness Against Dropout Bias Relative to Complete-Case ANCOVA