Example 1. Basic Use: Estimates, CIs, and P-values

WMWprob is identical to the area under the curve (AUC) when using a receiver operator characteristic (ROC) analysis to summarize sensitivity vs. specificity in diagnostic testing in medicine or signal detection in human perception research and other fields. Using the oft-used example of Hanley and McNeil (1982), we cover these analyses and concepts.

Example 1a

Regular two-sided CI with no test. Traditional (balanced, two-sided) 95% confidence interval of the form [LCL(0.025), UCL(0.025)]. No null hypothesis testing (no p-value).

Example 1b

Two-sided CI with tailored one-sided test. Traditional 95% CI, [LCL(0.025), UCL(0.020)]. Test whether the AUC exceeds the "essential" threshold of 0.80, i.e., H0: WMWprob ≤ 0.80 vs. H1: WMWprob > 0.80.

Example 1c

One-sided CI with tailored one-sided test. One-sided 95% CI, [LCL(0.05), UCL(0.00)]. Test whether the AUC exceeds the "essential" threshold of 0.80, i.e., H0: WMWprob ≤ 0.80 vs. H1: WMWprob > 0.80.

Example 1d

Unbalanced two-sided 95% CIs, [LCL(0.045), UCL(0.005)] and [LCL(0.005), UCL(0.045)]. Compare to [LCL(0.025), UCL(0.025)] and [LCL(0.05), UCL(0.00) = 1].

Example 1e

Demonstrating CI and p-value congruency. Compute [LCL(alphaL), UCL(alphaU)]. What is p-value for H0: WMWprob ≤ LCL(alphaL) vs. H1: WMWprob > LCL(alphaL)? p = alphaL. Likewise, what is p-value for H0: WMWprob ≤ UCL(0.alphaU) vs. H1: WMWprob > UCL(0.alphaU)? p = 1 - alphaU.

Storyline

The Hanley-McNeil dataset depicts how a single reader (a radiologist) rated the computed tomographic (CT) brain images of 109 subjects, 58 who were judged by other criteria to truly have a given neurological abnormality and 51 who were judged to be normal.

Ratings of 109 Images
==================================================================================
                                           Rating (Y)
            ----------------------------------------------------------------------
True          Definitely   Probably    Question-   Probably Definitely
Disease          Normal      Normal       able      Abnormal   Abnormal
Status            (1)         (2)         (3)         (4)         (5)      Total
----------------------------------------------------------------------------------
Abnormal (Y1)       3           2           2          11          33        51

Normal (Y2) 33 6 6 11 2 58
----------------------------------------------------------------------------------
Total 36 8 8 22 35 109
==================================================================================

Research Question

How well do the radiologist's ratings of the images predict the true disease status of these patients? Let (Y.abnormal, Y.normal) be a pair of randomly selected scores from the two groups. There are 58*53 = 3074 unique pairings. Then
AUC = WMWprob = Prob[Y.abnormal > Y.normal] + Prob[Y.abnormal = Y.normal]/2.

As shown below, these data yield an AUC of 0.893. To conclude that this physician's image ratings are clinically worthy, AUC should exceed 0.50 (just chance) by a substantial margin, say, AUC > 0.80. This is succinctly addressed by focusing on how the lower confidence limit (LCL) for AUC compares to that threshold. But most researchers still want to see and report a p-value.

Creating the Dataset

    abnormal <- rep(1:5, c( 3, 2, 2, 11, 33))
    normal <-   rep(1:5, c(33, 6, 6, 11, 2))
    Rating = c(abnormal,normal)
    TrueDiseaseStatus <- c(rep("Abnormal",length(abnormal)),
                 rep("Normal",length(normal)))