WMWprob is identical to the area under the curve (AUC) when using a receiver operator characteristic (ROC) analysis to summarize sensitivity vs. specificity in diagnostic testing in medicine or signal detection in human perception research and other fields. Using the oft-used example of Hanley and McNeil (1982), we cover these analyses and concepts.
WMWprob is identical to the area under the curve (AUC) when using a receiver operator characteristic (ROC) analysis to summarize sensitivity vs. specificity in diagnostic testing in medicine or signal detection in human perception research and other fields. Using the oft-used example of Hanley and McNeil (1982), we cover these analyses and concepts.
Example 1. Basic Use: Estimates, CIs, and P-values
WMWprob is identical to the area under the curve (AUC) when using a receiver operator characteristic (ROC) analysis to summarize sensitivity vs. specificity in diagnostic testing in medicine or signal detection in human perception research and other fields. Using the oft-used example of Hanley and McNeil (1982), we cover these analyses and concepts.
Two-sided CI with tailored one-sided test. Traditional 95% CI, [LCL(0.025), UCL(0.020)]. Test whether the AUC exceeds the "essential" threshold of 0.80, i.e., H0: WMWprob ≤ 0.80 vs. H1: WMWprob > 0.80.
One-sided CI with tailored one-sided test. One-sided 95% CI, [LCL(0.05), UCL(0.00)]. Test whether the AUC exceeds the "essential" threshold of 0.80, i.e., H0: WMWprob ≤ 0.80 vs. H1: WMWprob > 0.80.
Unbalanced two-sided 95% CIs, [LCL(0.045), UCL(0.005)] and [LCL(0.005), UCL(0.045)]. Compare to [LCL(0.025), UCL(0.025)] and [LCL(0.05), UCL(0.00) = 1].
Demonstrating CI and p-value congruency. Compute [LCL(alphaL), UCL(alphaU)]. What is p-value for H0: WMWprob ≤ LCL(alphaL) vs. H1: WMWprob > LCL(alphaL)? p = alphaL. Likewise, what is p-value for H0: WMWprob ≤ UCL(0.alphaU) vs. H1: WMWprob > UCL(0.alphaU)? p = 1 - alphaU.
Storyline
The Hanley-McNeil dataset depicts how a single reader (a radiologist) rated the computed tomographic (CT) brain images of 109 subjects, 58 who were judged by other criteria to truly have a given neurological abnormality and 51 who were judged to be normal.
Ratings of 109 Images
==================================================================================
Rating (Y)
----------------------------------------------------------------------
True Definitely Probably Question- Probably Definitely
Disease Normal Normal able Abnormal Abnormal
Status (1) (2) (3) (4) (5) Total
----------------------------------------------------------------------------------
Abnormal (Y1) 3 2 2 11 33 51
Normal (Y2) 33 6 6 11 2 58
----------------------------------------------------------------------------------
Total 36 8 8 22 35 109
==================================================================================
Research Question
How well do the radiologist's ratings of the images predict the true disease status of these patients? Let (Y.abnormal, Y.normal) be a pair of randomly selected scores from the two groups. There are 58*53 = 3074 unique pairings. Then
AUC = WMWprob = Prob[Y.abnormal > Y.normal] + Prob[Y.abnormal = Y.normal]/2.
​
As shown below, these data yield an AUC of 0.893. To conclude that this physician's image ratings are clinically worthy, AUC should exceed 0.50 (just chance) by a substantial margin, say, AUC > 0.80. This is succinctly addressed by focusing on how the lower confidence limit (LCL) for AUC compares to that threshold. But most researchers still want to see and report a p-value.
Creating the Dataset
abnormal <- rep(1:5, c( 3, 2, 2, 11, 33))
normal <- rep(1:5, c(33, 6, 6, 11, 2))
Rating = c(abnormal,normal)
TrueDiseaseStatus <- c(rep("Abnormal",length(abnormal)),
rep("Normal",length(normal)))