This post is to talk about how to compare two paired areas under ROC curves(AUC) for diagnostic accuracy by non-inferiority test.
Suppose you have a new product that you want to conduct a method comparison with an existing product. The primary endpoint is the AUC of your product is not worse than that compared product. It's no doubt that this is a non-inferiority trial design. So suppose the non-inferiority margin is -0.15
, the one-sided hypotheses is below:
- Margin: -0.15
- H0: AUC1 - AUC2 ≤ -0.1500
- H1: AUC1 - AUC2 > -0.1500
And then which method could we use to assess the statistical significance? In common, there are two methods used to estimate the AUC. One method is the empirical (nonparametric) method by DeLong et al. (1988), which doesn't depend on the strong normality assumption that the Binormal method makes. The other method is Binormal method presented by Metz (1978) and McClish (1989). Actually the bootstrap technique is also suggested for both methods, like the rocNIT
R package.
In this post, I’m mainly going to record how to use the nonparametric method to compare the paired AUC, referring to the self-topped Liu (2006) article.
The method is accomplished by R code to demonstrate each procedure. I don’t post the formula in here, please read that article that is very clear and easy to understand.
For the paired AUC, we can know that these two products are measured on the same subject. So I create a simulation data with 80 subjects, and the corresponding two measurements that are tested by the new product and compared product.
Given the following simulation data that can be download from Paired_Criteria_adjusted.txt
The nonparametric estimation of the ROC curve area is based on the Mann-Whitney U statistic.
And then we need to calculate the estimated variance of auc1 - auc2
When we get the estimated variance, the difference of two paired AUC and margin, we can also obtain the Z statistic. Thereby the p value and one-side 95% lower limit can be calculated either.
The total code is as following:
# Helper functions
h_get_u_statistic <- function(x, y) {
if (x > y) {
return(1)
}
if (x == y) {
return(0.5)
}
if (x < y) {
return(0)
}
}
h_auc_v10_v01 <- function(n1, n0, v1, v0) {
v10 <- NULL
v01 <- NULL
# Mann-Whitney U statistic
auc <- sum(sapply(v1, function(x) {
sapply(v0, function(y) {h_get_u_statistic(x, y)})
})) / (n1 * n0)
for (i in 1:n1) {
v10 <- c(v10, sum(
sapply(v0, function(y) {h_get_u_statistic(v1[i], y)})
) / n0)
}
for (i in 1:n0) {
v01 <- c(v01, sum(
sapply(v1, function(x) {h_get_u_statistic(x, v0[i])})
) / n1)
}
return(list(auc = auc, v10 = v10, v01 = v01))
}
# To get the auc and corresponding intermediate parameters `v10` and `v01`
get_auc <- function(response, var) {
dat <- cbind(response, var)
n0 <- sum(response == 0, na.rm = TRUE)
n1 <- sum(response == 1, na.rm = TRUE)
var0 <- var[response == 0]
var1 <- var[response == 1]
c(
list(n1 = n1, n0 = n0, var1 = var1, var0 = var0),
h_auc_v10_v01(n1 = n1, n0 = n0, v1 = var1, v0 = var0)
)
}
# The main program.
auc.test <- function(mroc1, mroc2, margin, alpha = 0.05) {
mod1 <- mroc1
mod2 <- mroc2
n1 <- mod1$n1
n0 <- mod1$n0
auc1 <- mod1$auc
auc2 <- mod2$auc
s10_11 <- sum((mod1$v10 - auc1)^2) / (n1 - 1)
s10_22 <- sum((mod2$v10 - auc2)^2) / (n1 - 1)
s10_12 <- sum((mod1$v10 - auc1) * (mod2$v10 - auc2)) / (n1 - 1)
s01_11 <- sum((mod1$v01 - auc1)^2) / (n0 - 1)
s01_22 <- sum((mod2$v01 - auc2)^2) / (n0 - 1)
s01_12 <- sum((mod1$v01 - auc1) * (mod2$v01 - auc2)) / (n0 - 1)
variance <- (s10_11 + s10_22 - 2 * s10_12) / n1 + (s01_11 + s01_22 - 2 * s01_12) / n0
auc_diff <- auc1 - auc2
z <- (auc_diff - margin) / sqrt(variance)
p <- 1 - pnorm(z)
lower_limit <- auc_diff - (qnorm(1 - alpha) * sqrt(variance))
list(Difference = auc_diff, `Non-Inferiority Pvalue` = p, `One-Sided 95% Lower Limit` = lower_limit)
}
Now let's start to run the main programs. Firsly import the simulation data.
data <- data.table::fread("./Paired_Criteria_adjusted.txt") %>%
mutate(Response = if_else(Condition == "Present", 1, 0))
And then obtain the two results for ROC models.
mroc1 <- get_auc(response = data$Response, var = data$Method1)
mroc2 <- get_auc(response = data$Response, var = data$Method2)
In the end, calculate the final output, especially the p value and one-side limit.
> auc.test(mroc1, mroc2, margin = -0.15, alpha = 0.05)
$Difference
[1] -0.04771115
$`Non-Inferiority Pvalue`
[1] 0.04447758
$`One-Sided 95% Lower Limit`
[1] -0.1466274
From this outcome, we can conclude that we need to reject the H0 hypothesis. Not only because the P value is less than 0.05, but also the one-side lower limit is greater than -0.15. So we can conclude that the Method1 (new product) is not worse than the Method2 (registered product) when the margin is 0.15.
Above is my note for a non-inferiority test in the diagnostic area. Actually I prefer to use R package or NCSS software to estimate the test, as personal code is always not correct sometimes. So the above code is helpful to understand the principle, not better to use directly.
Reference
Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves
Comparing Two ROC Curves – Paired Design - NCSS
Please indicate the source: http://www.bioinfo-scrounger.com