Bootstrap Confidence Interval

一般在诊断试剂注册的临床试验中，对于acceptance criteria，一般是看其CI（confidence interval）的下限比较多（一般在sample size的时候也多设计下限条件）；不同acceptance criteria对应的CI计算方法也各有不同，但是也都是比较常见的几种：

标准正态分布的CI
针对小样本正态分布的Wilson score interval（使用的比较多。。。）
Wilson的变形Transformed Wilson
Jackknife（刀切法）
Bootstrap（自助法，使用的也比较多。。。）

在Precison assessment in tumor diagnosis (Immunological method)文章中提到APA/ANA等acceptance criteria会使用bootstrap来计算CI（还有如：Passing-block回归系数、样本量过少的时候、reference interval建立等）

常见的Bootstrap计算渐近置信区间的方法有：

标准正态Bootstrap置信区间
基本Bootstrap置信区间
Bootstrap百分位数（percentile）置信区间
Bootstrap t置信区间

以下以percentile bootstrap（有效简单，也便于解释）为例：

若用R的boot包来计算bootstrap CI的话，需要先构建一个计算statistic的函数，注意这里的function必须要有两个函数，其中一个是data,另外一个则是indices，如若计算APA/ANA，将计算过程写入function中：

apa_cal <- function(df, ind){
  df <- as.data.frame(df)
  a <- sum(df[ind, "status"] == "high" &  df[ind, "STATUS2"] == "high")
  b <- sum(df[ind, "status"] == "high" &  df[ind, "STATUS2"] == "low")
  c <- sum(df[ind, "status"] == "low" &  df[ind, "STATUS2"] == "high")
  d <- sum(df[ind, "status"] == "low" &  df[ind, "STATUS2"] == "low")
  
  tb <- matrix(c(a, c, b, d), nrow = 2)
  opa <- (tb[1,1] + tb[2,2]) / (tb[1,1] + tb[1,2] + tb[2,1] + tb[2,2]) * 100
  apa <- 2 * tb[1,1] / (tb[1,1] + tb[2,1] + tb[1,1] + tb[1,2]) * 100
  ana <- 2 * tb[2,2] / (tb[2,2] + tb[1,2] + tb[2,2] + tb[2,1]) * 100
  return(c(opa, apa, ana))
}

接着调用boot函数运行bootstrap，记得set random number

set.seed(12345)
bot <- boot(data = dat, statistic = apa_cal, R = 2000)

然后用boot.ci函数计算CI，其中type参数是用于当function有多个statistic的情况，如type = 1即计算第一个statistic的bootstrap CI

bci1<- boot.ci(bot, conf = 0.95, type = c("perc"), index = 1)
ci1 <- c(bci1$percent[,4], bci1$percent[,5])

若statistic很多的时候，觉得一个个设定type参数比较麻烦，可以尝试下broom包的tidy()函数

参考资料：

Bootstrap Confidence Intervals for more than one statistics through boot.ci function

本文出自于http://www.bioinfo-scrounger.com转载请注明出处