Continue with the question in the previous article (Multiple Imputaton - Linear Regression in R), where we just discussed how to compute the pooled coefficients of ANCOVA using `mice`

package but left out the Ls-means and hypothesis test. Luckly I find out that `emmeans`

package have wrapped this process inside so we can use it to obtain the pooled Ls-means estimation and p-value straightforward wihout `pool`

function of `mice`

.

Supposed that we have fitted the ANCOVA for imputed datasets and get the fitted models `mods`

for each imputation here. Then I will use the `emmeans::emmeans()`

function to estimate the ls-means, which is not the indivival estimate for each imputation but rather the pooled one. The pool process remains to use the Rubin's Rules.

```
ems <- emmeans::emmeans(mods, ~trt)
data.frame(ems)
## trt emmean SE df lower.CL upper.CL
## 1 1 -10.55330 0.4696385 194.9729 -11.47953 -9.627078
## 2 2 -12.39073 0.4698075 194.8239 -13.31729 -11.464169
```

Afterwards using the `emmeans::contrast()`

function to do the contrasts analysis and get the CI and p-value for the difference (`trt2 - trt1`

).

```
conr <- emmeans::contrast(ems, method = list(c(-1, 1)), adjust = "none")
conr_test <- emmeans::test(conr)
data.frame(conr_test)
## contrast estimate SE df t.ratio p.value
## 1 c(-1, 1) -1.837429 0.6657861 194.8238 -2.759788 0.006335778
```

If we want to validate the above result using SAS procedure, we must first export the csv from the `low_imp_res`

dataframe that we created in the last article.

`write.csv(low_imp_res, file = "./low_imputed.csv", row.names = FALSE, na = "")`

If we want to validate the above result using SAS procedure, we must first export the csv from the `low_imp_res`

dataframe that we created in the last article. And than fit the ANCOVA model with `proc mixed`

to obtain the ls-means estimate(`lsm`

and `diff`

) for each imputation. Finally use `proc mianalyze`

to pool the results of all imputation for ls-means (`comb_lsm`

) and difference (`comb_diff`

) within two groups.

```
proc import datafile="&_projpth.\02_Raw Data\low_imputed.csv"
out=low_imp(where=(imp ne 0))
dbms=csv replace;
getnames = yes;
run;
ods output lsmeans=lsm diffs=diff;
proc mixed data=low_imp;
by imp;
class trt / ref=first;
model week8=basval trt /ddfm=kr;
lsmeans trt / cl pdiff diff;
run;
proc sort data=lsm; by trt; run;
ods output ParameterEstimates=comb_lsm;
proc mianalyze data=lsm;
by trt;
modeleffects estimate;
stderr stderr;
run;
ods output ParameterEstimates=comb_diff;
proc mianalyze data=diff;
by trt _trt;
modeleffects estimate;
stderr stderr;
run;
```

The SAS output can be seen below.

We can observe that the estimate and SE from SAS are consistent with R, but there is a significant discrepancy in DF (degress of freedom). In R df is `197`

while in SAS it is `3.94E6`

. That's because there are different methods to calculate the the df, an older one is used in SAS and adjusted version is used in the `mice`

package. That will also lead to different p-values. For more details can be found in Degrees of Freedom and P-values of Rubins-Rules.

Someone may be curious how to calculate the df in `mice`

and SAS. Let’s repeat the process of calculation using the formulas in above link. The specific formulas are not shown here, which is very easy to understand. So I just convert them as the R code.

The older method used in SAS to calculate the df for the t-distribution is defined as (Rubin (1987), Van Buuren (2018)). The specific formulas are not shown here, which is very easy to understand. So I just convert them as the R code. From below, the `lambda`

can be derived from the between (`Vb`

) and total (`Vt`

) missing data variance, and the `m`

represents the number of imputed datasets. The rounded value is `3.94E6`

that is equal to the results in SAS.

```
m <- 5
Vb <- 0.000372
Vt <- 0.443271
lambda <- (Vb + Vb / m) / Vt
df_old <- (m - 1) / lambda^2
df_old
## [1] 3944121
```

As the above `df`

is too larger for the pooled result, compared to the dfs in each imputed dataset, which is inappropriate. Barnard and Rubin (1999) adjusted this df by using a new formula (See formula 9.9 in that article.). We should compute the Observed df (`df_obs`

) and then adjusted df (`df_adj`

) where `n`

represents the sample size in the imputed datasets, and `k`

the number of parameters in the fitted model (in my case, there are 3 parameters).

```
n <- 200
k <- 3
m <- 5
df_obs <- (((n - k) + 1) / ((n - k) + 3)) * (n - k) * (1 - lambda)
df_adj <- (df_old * df_obs) / (df_old + df_obs)
df_adj
## [1] 194.824
```

Finally we can look at the df for the pooled estimates from `pool_res`

using `pool()`

function. The `df`

for `trt2`

term is about equal to our computation, as seen below.

```
pool_res <- pool(mods)
pool_res
## Class: mipo m = 5
## term m estimate ubar b t dfcom df riv
## 1 (Intercept) 5 0.6767737 2.982354968 6.679090e-03 2.990369876 197 194.4394 0.002687443
## 2 basval 5 -0.5354029 0.006510448 1.426767e-05 0.006527569 197 194.4534 0.002629804
## 3 trt2 5 -1.8374286 0.442824415 3.722857e-04 0.443271157 197 194.8238 0.001008849
## lambda fmi
## 1 0.002680240 0.01278278
## 2 0.002622906 0.01272531
## 3 0.001007832 0.01110765
```

##### updated from 2024-05-28

The `df`

calculation for pooling process in `emmeans`

package for `mina`

class has kept the consistency with `mice`

package, using the the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999) that mentioned in the `pool()`

documents, see https://github.com/rvlenth/emmeans/issues/494. Thus we can get the same `df`

in either the `mice`

or `emmeans`

packages. All above results have been updated.