Non-Inferiority Test for Paired ROC Curves

This post is to talk about how to compare two paired areas under ROC curves(AUC) for diagnostic accuracy by non-inferiority test.

Reference: Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves

Suppose you have a new product that you want to conduct a method comparison with an existing product. The primary endpoint is the AUC of your product is not worse than that compared product. It's no doubt that this is a non-inferiority trial design. So suppose the non-inferiority margin is -0.15, the one-sided hypotheses is below:

Margin: -0.15
H0: AUC1 - AUC2 ≤ -0.1500
H1: AUC1 - AUC2 > -0.1500

And then which method could we use to assess the statistical significance? In common, there are two methods used to estimate the AUC. One method is the empirical (nonparametric) method by DeLong et al. (1988), which doesn't depend on the strong normality assumption that the Binormal method makes. The other method is Binormal method presented by Metz (1978) and McClish (1989). Actually the bootstrap technique is also suggested for both methods, like the rocNIT R package.

In this post, I’m mainly going to record how to use the nonparametric method to compare the paired AUC, referring to the self-topped Liu (2006) article.

The method is accomplished by R code to demonstrate each procedure. I don’t post the formula in here, please read that article that is very clear and easy to understand.

For the paired AUC, we can know that these two products are measured on the same subject. So I create a simulation data with 80 subjects, and the corresponding two measurements that are tested by the new product and compared product.

Given the following simulation data that can be download from Paired_Criteria_adjusted.txt

The nonparametric estimation of the ROC curve area is based on the Mann-Whitney U statistic.

And then we need to calculate the estimated variance of auc1 - auc2

When we get the estimated variance, the difference of two paired AUC and margin, we can also obtain the Z statistic. Thereby the p value and one-side 95% lower limit can be calculated either.

The total code is as following:

# Helper functions
h_get_u_statistic <- function(x, y) {
  if (x > y) {
    return(1)
  }
  if (x == y) {
    return(0.5)
  }
  if (x < y) {
    return(0)
  }
}

h_auc_v10_v01 <- function(n1, n0, v1, v0) {
  v10 <- NULL
  v01 <- NULL

  # Mann-Whitney U statistic
  auc <- sum(sapply(v1, function(x) {
    sapply(v0, function(y) {h_get_u_statistic(x, y)})
  })) / (n1 * n0)

  for (i in 1:n1) {
    v10 <- c(v10, sum(
      sapply(v0, function(y) {h_get_u_statistic(v1[i], y)})
    ) / n0)
  }

  for (i in 1:n0) {
    v01 <- c(v01, sum(
      sapply(v1, function(x) {h_get_u_statistic(x, v0[i])})
    ) / n1)
  }

  return(list(auc = auc, v10 = v10, v01 = v01))
}

# To get the auc and corresponding intermediate parameters `v10` and `v01`
get_auc <- function(response, var) {
  dat <- cbind(response, var)
  n0 <- sum(response == 0, na.rm = TRUE)
  n1 <- sum(response == 1, na.rm = TRUE)
  var0 <- var[response == 0]
  var1 <- var[response == 1]

  c(
    list(n1 = n1, n0 = n0, var1 = var1, var0 = var0),
    h_auc_v10_v01(n1 = n1, n0 = n0, v1 = var1, v0 = var0)
  )
}

# The main program.
auc.test <- function(mroc1, mroc2, margin, alpha = 0.05) {
  mod1 <- mroc1
  mod2 <- mroc2

  n1 <- mod1$n1
  n0 <- mod1$n0
  auc1 <- mod1$auc
  auc2 <- mod2$auc

  s10_11 <- sum((mod1$v10 - auc1)^2) / (n1 - 1)
  s10_22 <- sum((mod2$v10 - auc2)^2) / (n1 - 1)
  s10_12 <- sum((mod1$v10 - auc1) * (mod2$v10 - auc2)) / (n1 - 1)

  s01_11 <- sum((mod1$v01 - auc1)^2) / (n0 - 1)
  s01_22 <- sum((mod2$v01 - auc2)^2) / (n0 - 1)
  s01_12 <- sum((mod1$v01 - auc1) * (mod2$v01 - auc2)) / (n0 - 1)

  variance <- (s10_11 + s10_22 - 2 * s10_12) / n1 + (s01_11 + s01_22 - 2 * s01_12) / n0
  
  auc_diff <- auc1 - auc2
  z <- (auc_diff - margin) / sqrt(variance)
  p <- 1 - pnorm(z)
  lower_limit <- auc_diff - (qnorm(1 - alpha) * sqrt(variance))
  list(Difference = auc_diff, `Non-Inferiority Pvalue` = p, `One-Sided 95% Lower Limit` = lower_limit)
}

Now let's start to run the main programs. Firsly import the simulation data.

data <- data.table::fread("./Paired_Criteria_adjusted.txt") %>%
  mutate(Response = if_else(Condition == "Present", 1, 0))

And then obtain the two results for ROC models.

mroc1 <- get_auc(response = data$Response, var = data$Method1)
mroc2 <- get_auc(response = data$Response, var = data$Method2)

In the end, calculate the final output, especially the p value and one-side limit.

> auc.test(mroc1, mroc2, margin = -0.15, alpha = 0.05)
$Difference
[1] -0.04771115

$`Non-Inferiority Pvalue`
[1] 0.04447758

$`One-Sided 95% Lower Limit`
[1] -0.1466274

From this outcome, we can conclude that we need to reject the H0 hypothesis. Not only because the P value is less than 0.05, but also the one-side lower limit is greater than -0.15. So we can conclude that the Method1 (new product) is not worse than the Method2 (registered product) when the margin is 0.15.

Above is my note for a non-inferiority test in the diagnostic area. Actually I prefer to use R package or NCSS software to estimate the test, as personal code is always not correct sometimes. So the above code is helpful to understand the principle, not better to use directly.

Reference

Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves
Comparing Two ROC Curves – Paired Design - NCSS

Please indicate the source: http://www.bioinfo-scrounger.com