# KeepNotes blog

Stay hungry, Stay Foolish.

0%

Continue with the question in the previous article (Multiple Imputaton - Linear Regression in R), where we just discussed how to compute the pooled coefficients of ANCOVA using `mice` package but left out the Ls-means and hypothesis test. Luckly I find out that `emmeans` package have wrapped this process inside so we can use it to obtain the pooled Ls-means estimation and p-value straightforward wihout `pool` function of `mice`.

Supposed that we have fitted the ANCOVA for imputed datasets and get the fitted models `mods` for each imputation here. Then I will use the `emmeans::emmeans()` function to estimate the ls-means, which is not the indivival estimate for each imputation but rather the pooled one. The pool process remains to use the Rubin's Rules.

``````ems <- emmeans::emmeans(mods, ~trt)
data.frame(ems)

##   trt    emmean        SE       df  lower.CL   upper.CL
## 1   1 -10.55330 0.4696385 194.9729 -11.47953  -9.627078
## 2   2 -12.39073 0.4698075 194.8239 -13.31729 -11.464169``````

Afterwards using the `emmeans::contrast()` function to do the contrasts analysis and get the CI and p-value for the difference (`trt2 - trt1`).

``````conr <- emmeans::contrast(ems, method = list(c(-1, 1)), adjust = "none")
conr_test <- emmeans::test(conr)
data.frame(conr_test)

##   contrast  estimate        SE       df   t.ratio     p.value
## 1 c(-1, 1) -1.837429 0.6657861 194.8238 -2.759788 0.006335778``````

If we want to validate the above result using SAS procedure, we must first export the csv from the `low_imp_res` dataframe that we created in the last article.

``write.csv(low_imp_res, file = "./low_imputed.csv", row.names = FALSE, na = "")``

If we want to validate the above result using SAS procedure, we must first export the csv from the `low_imp_res` dataframe that we created in the last article. And than fit the ANCOVA model with `proc mixed` to obtain the ls-means estimate(`lsm` and `diff`) for each imputation. Finally use `proc mianalyze` to pool the results of all imputation for ls-means (`comb_lsm`) and difference (`comb_diff`) within two groups.

``````proc import datafile="&_projpth.\02_Raw Data\low_imputed.csv"
out=low_imp(where=(imp ne 0))
dbms=csv replace;
getnames = yes;
run;

ods output lsmeans=lsm diffs=diff;
proc mixed data=low_imp;
by imp;
class trt / ref=first;
model week8=basval trt /ddfm=kr;
lsmeans trt / cl pdiff diff;
run;

proc sort data=lsm; by trt; run;
ods output ParameterEstimates=comb_lsm;
proc mianalyze data=lsm;
by trt;
modeleffects estimate;
stderr stderr;
run;

ods output ParameterEstimates=comb_diff;
proc mianalyze data=diff;
by trt _trt;
modeleffects estimate;
stderr stderr;
run;``````

The SAS output can be seen below.

We can observe that the estimate and SE from SAS are consistent with R, but there is a significant discrepancy in DF (degress of freedom). In R df is `197` while in SAS it is `3.94E6`. That's because there are different methods to calculate the the df, an older one is used in SAS and adjusted version is used in the `mice` package. That will also lead to different p-values. For more details can be found in Degrees of Freedom and P-values of Rubins-Rules.

Someone may be curious how to calculate the df in `mice` and SAS. Let’s repeat the process of calculation using the formulas in above link. The specific formulas are not shown here, which is very easy to understand. So I just convert them as the R code.

The older method used in SAS to calculate the df for the t-distribution is defined as (Rubin (1987), Van Buuren (2018)). The specific formulas are not shown here, which is very easy to understand. So I just convert them as the R code. From below, the `lambda` can be derived from the between (`Vb`) and total (`Vt`) missing data variance, and the `m` represents the number of imputed datasets. The rounded value is `3.94E6` that is equal to the results in SAS.

``````m <- 5
Vb <- 0.000372
Vt <- 0.443271

lambda <- (Vb + Vb / m) / Vt
df_old <- (m - 1) / lambda^2
df_old
## [1] 3944121``````

As the above `df` is too larger for the pooled result, compared to the dfs in each imputed dataset, which is inappropriate. Barnard and Rubin (1999) adjusted this df by using a new formula (See formula 9.9 in that article.). We should compute the Observed df (`df_obs`) and then adjusted df (`df_adj`) where `n` represents the sample size in the imputed datasets, and `k` the number of parameters in the fitted model (in my case, there are 3 parameters).

``````n <- 200
k <- 3
m <- 5
df_obs <- (((n - k) + 1) / ((n - k) + 3)) * (n - k) * (1 - lambda)
df_adj <- (df_old * df_obs) / (df_old + df_obs)
## [1] 194.824``````

Finally we can look at the df for the pooled estimates from `pool_res` using `pool()` function. The `df` for `trt2` term is about equal to our computation, as seen below.

``````pool_res <- pool(mods)
pool_res
## Class: mipo    m = 5
##          term m   estimate        ubar            b           t dfcom       df         riv
## 1 (Intercept) 5  0.6767737 2.982354968 6.679090e-03 2.990369876   197 194.4394 0.002687443
## 2      basval 5 -0.5354029 0.006510448 1.426767e-05 0.006527569   197 194.4534 0.002629804
## 3        trt2 5 -1.8374286 0.442824415 3.722857e-04 0.443271157   197 194.8238 0.001008849
##        lambda        fmi
## 1 0.002680240 0.01278278
## 2 0.002622906 0.01272531
## 3 0.001007832 0.01110765``````
##### updated from 2024-05-28

The `df` calculation for pooling process in `emmeans` package for `mina` class has kept the consistency with `mice` package, using the the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999) that mentioned in the `pool()` documents, see https://github.com/rvlenth/emmeans/issues/494. Thus we can get the same `df` in either the `mice` or `emmeans` packages. All above results have been updated.