Example 3. The "SuperHDL" Story

A small study of heart disease patients testing a hypothesis so improbable its principal investigator [Steven Nissen, MD, Cleveland Clinic] says he gave it a one-in-10,000 chance of succeeding. ... When he saw the data, he was stunned. "The plaques regressed. A lot. More than has been seen with any drug. I almost fell off my seat. This is just so bizarre and unusual."

– Gina Kolata, New York Times, 5 November 2003

...what very much appears to be a real breakthrough in treating heart disease.

– Peter Jennings, ABC World News Tonight, 4 November 2003

... the first medicine ever shown to actually reduce the amount of cholesterol plaque in arteries.

– Matthew Herper, Forbes.com, 22 December 2003

We made too many wrong mistakes.

– Yogi Berra, 18-time All-Star catcher, New York Yankees

In the prestigious medical journal, JAMA: Journal of the American Medical Association, Nissen, S. E., et al. (2003) reported on the "Effect of Recombinant ApoA-I Milano on Coronary Atherosclerosis in Patients with Acute Coronary Syndromes: a Randomized Controlled Trial. ApoA-I Milano is a very rare variant of HDL cholesterol (the "good" kind), and "recombinant" means it was biologically manufactured in a laboratory, in this case by tiny company in Ann Arbor, Michigan, Esperion Therapeutics, which called the molecule ETC-216.

As per the above quotes and video clip, the results were immediately touted widely in the mainstream press and media. ETC-216 became informally known as "SuperHDL." Within a few weeks, the huge pharmaceutical company Pfizer paid $1.3billion in cash to acquired Esperion and thus the rights regarding ETC-216.

Whatever further research Pfizer conducted on ETC-216 must have faltered. Six years later (late 2009), The Medicines Company announced it had acquired exclusive worldwide rights to ETC-216 from Pfizer. Pfizer received $10miillion plus profit-sharing considerations if ETC-216 (soon renamed MDCO-216) ever became marketable. The testing by The Medicines Company also failed, so in 2016 it discontinued development. Thus, Pfizer paid $1.3billion for rights it eventually dumped for $10million, a loss of over 99%, not counting the cost of their own research program.

Storyline

47 patients with coronary atherosclerosis completed the Nissen, et. al (2003) study protocol. Working under a 2:2:1 randomization scheme, 21 received a "low" dose of ETC-216, 15 received "triple" that low dose, and 11 received placebo. Each dose was infused intravenously weekly for 5 weeks.

The primary outcome measure was the raw change over the 5 weeks in percent atheroma volume (PAV), which is the proportion of the targeted cardiac vessel occluded by atherosclerotic plaque. Greater PAV indicates more atherosclerosis. Letting PAV.0 and PAV.5 be PAV values at Week 0 (baseline) and Week 5 (end of study),

PAVchange = PAV.5 - PAV.0.
Thus, PAVchange < 0 indicates that PAV improved , so PAVchange = –0.04 is greater improvement than PAVchange = –0.01. Suppose a subject who received ETC-216 goes from PAV.0 = 61.1% occlusion to PAV.5 = 55.3% occlusion. PAVchange = 55.3% – 61.1% = –5.8%, a clinically important reduction in plaque burden for only a 5-week period. Of course, such changes could be merely due to ordinary fluctuations in PAV and ever-present measurement variation. This is why clinical trials incorporate appropriate control groups.

For the primary analysis, Dr. Nissen combined the "low" and "triple" ETC-216 groups, as we do below, and, most critically, he did not compare this combined ETC-216 arm to the placebo arm. Instead, the analysis focused only on PAVchange for the 36 patients who received ETC-216. That mean and median were –1.06 and –0.81, respectively. The Wilcoxon signed rank test yielded p = 0.02, but this does not test hypotheses about mean(Y) or median(Y). However, a footnote to Table 2 reported that the combined ETC-216 arm was compared to the placebo arm using an "analysis of covariance of ranks of change from baseline, with the baseline value as a covariate," which produced a p-value of 0.29.

Not comparing treatment to placebo in such a study is fraught with obvious perils. In addition, Gina Kolata's New York Times story reported that Dr. Nissen had given ETC-216 "a one-in-10,000 chance of succeeding." Regardless of the method being used, when a research hypothesis is much in doubt when planning the study, obtaining p = 0.02 when testing it has a high chance of being a false positive. Indeed, the same logic is a central principle in medical diagnostic testing, especially in screening studies for low prevalence diseases.

Here we perform a faux comparison of ETC-216 to placebo using the WMWprob parameter.

Research Question

Is SuperHDL effective in reducing PAV? Using

WMWprob = Pr[PAVchange{Placebo} > PAVchange{SuperHDL}] +
Pr[PAVchange{Placebo} = PAVchange{SuperHDL}]/2
Because this is such a small trial over only five weeks, any such improvement (here, WMWrprob > 0.50) is potentially important. Thus, the hypothesis structure is H0: WMWprob ≤ 0.50 versus H1: WMWprob > 0.50.

Creating a Faux Dataset

In late 2009, one of us (RO'B) wrote Dr. Nissen to obtain the PAV data. He initially agreed to provide them, but several days later retracted, emailing

I will inquire whether Pfizer would agree to release the data. I’m not optimistic given your history of publicly criticizing the ApoA1 Milano study. They are likely to believe that you have ulterior motives (and I tend to agree).

Point of fact, the study was funded by Esperion Therapeutics, which (as stated in the JAMA article) was only "permitted to review the manuscript and comment, but the final decision on content rested with [Dr. Nissen] in consultation with the other authors." Pfizer was only mentioned as having provided financial support to four of the article's authors, including Dr. Nissen. Pfizer purchased the rights to ETC-216 after publication. As to the charge of "ulterior motives," if that includes bringing to light missteps found in the public scientific literature in order to teach and advance statistical methodology, then Dr. O'Brien pleads guilty.

Thus, having tried but failed to get the actual data, we painstakingly created a faux dataset to mimic them.

   PAVchangeP <- c(-4.7,-4.0,-2.4,-1.8,-0.5,0.03,1.9,2.62,2.9,3.0,4.5)
   PAVchangeSHDL <- c(-10.0,-4.4,-4.1,-4.1,-4.1,-4.0,-4.0,-4.0,-4.0,-3.9,-3.9,
       -3.1,-3.1,-3.0,-2.7,-2.3,-2.1,-1.02,-0.6,-0.6,-0.2,-0.1,0.8,0.9,0.9,2.1,
        2.2,2.2,2.2,2.2,2.2,2.4,2.5,3.0,3.8,3.9)

treatment <- rep(c("Placebo", "SuperHDL"), c(11,36))
PAVchange <- c(PAVchangeP, PAVchangeSHDL)

The means, standard deviations, and SDs for the faux data are nearly identical to the values reported in Table 2, "Percent Atheroma Volume in the Target Coronary Segment" of Dr. Nissen's article.

==========================================================================

Placebo SuperHDL

............................ ............................

Mean SD Median Mean SD Median

Table 2 0.14 3.09 0.03 –1.06 3.17 –0.81

Faux 0.141 3.086 0.030 –1.056 3.174 –0.810

==========================================================================

WMWprob Analysis (Ordinary)

Analysis Ex3a conforms to an ordinary approach. Note that Ncats=5 only directs how data are partitioned for tabling counts, etc., here into approximate quintiles. This has no affect on the WMW analysis. (The estimated WMWprob is 0.61 with a regular, two-sided interval of 95% of [0.40, 0.79]. Testing H0: WMWprob = 0.50, gives p = 0.30 (two-sided).

   Ex3a <- WMW(Y=PAVchange, Group=treatment,
              GroupLevel=c("Placebo", "SuperHDL"),
              Ncats=5,
              Alpha=c(0.025, 0.025),
              WMWprob0=0.50)

*************************************************************
             WMW: Wilcoxon-Mann-Whitney Analysis
    Comparing Two Groups with Respect to an Ordinal Outcome

                  Outcome variable: PAVchange
                   Group variable: treatment
          Comparison: (Y1) Placebo vs. (Y2) SuperHDL
*************************************************************

Counts
******
     PAVchange    Placebo    SuperHDL
      [-10,-4]          2           9
     (-4,-2.3]          1           7
   (-2.3,0.03]          3           6
    (0.03,2.2]          1           9
     (2.2,4.5]          4           5
         Total         11          36

Proportions
***********
     PAVchange    Placebo    SuperHDL
      [-10,-4]      0.182       0.250
     (-4,-2.3]      0.091       0.194
   (-2.3,0.03]      0.273       0.167
    (0.03,2.2]      0.091       0.250
     (2.2,4.5]      0.364       0.139
         Total      1.000       1.000

Cumulative Proportions
**********************
     PAVchange    Placebo    SuperHDL
      [-10,-4]      0.182       0.250
     (-4,-2.3]      0.273       0.444
   (-2.3,0.03]      0.545       0.611
    (0.03,2.2]      0.636       0.861
     (2.2,4.5]      1.000       1.000

WMW Parameters
**********************************************************************
WMWprob = Pr[PAVchange{Placebo} > PAVchange{SuperHDL}] +
              Pr[PAVchange{Placebo} = PAVchange{SuperHDL}]/2

WMWodds = WMWprob/(1 - WMWprob)
**********************************************************************

Sample Sizes
***********************
Placebo        11
SuperHDL       36
***********************

**********************************************************
Stochastic Superiority       # of Pairs     Probability
......................       ..........     ...........
{Placebo} > {SuperHDL}              240           0.606
{Placebo} = {SuperHDL}                5           0.013
{Placebo} < {SuperHDL}              151           0.381
                   Total:             396           1.000

      WMWprob = (240 + 5/2)/396 = 0.612
      WMWodds = 0.612/(1 - 0.612) = 1.580
**********************************************************

Hypotheses Tested
**************************************************
    H0: WMWprob <= 0.50        H0: WMWodds <= 1.00
    H1: WMWprob > 0.50        H1: WMWodds > 1.00
**************************************************

*****************************************************************
Parameter Estimate      0.95 CI*     One-Sided Hypothesis   P**
.................................................................
WMWprob     0.612    [0.404, 0.787] H0: WMWprob <= 0.50 0.150
WMWodds     1.580    [0.677, 3.686] H0: WMWodds <= 1.00 0.150
*****************************************************************
*CI error rates (alphaL, alphaU): (0.025, 0.025)
CI Method: coupling Sen (1967) & Mee (1990)
**Normal(0, 1) test statistic, Z = 1.04
P-value for H0: WMWprob >= 0.50: 1 - 0.150 = 0.850
Two-sided p-value (H0: WMWprob = 0.50): 0.300

WMWprob Analysis (Tailored)

Analysis Ex3b is tailored to the research question. This study was very early in its "March of Science," so the main statistical interest should be to liberally assess how superior ETC-216 might be relative to placebo. While the one-sided CI for WMWprob, [LCL(0.05), UCL(0.00)=1.00] gives us the highest LCL while maintaining 95% confidence, its UCL is not informative in any way. Here, we use the 95% CI [LCL(0.045), UCL(0.005)]. In addition, shouldn't we be interested in whether WMWprob exceeds 0.50 by some "essential" threshold?" Given patients were only treated for 5 weeks, concluding that WMWprob > 0.55 might have sparked great hope. Therefore, we test H0: WMWprob ≤ 0.55 vs. H1: WMWprob > 0.55, which equates to H0: WMWodds ≤ 1.22 vs. H1: WMWodds > 1.22.

Of course, the WMWprob estimate is still 0.61, but the 95% CI has moved from [0.404, 0.787] to [0.430, 0.825]. (By the way, LCL(0.50) = 0.435.) Instead of the two-sided p-value of 0.30 for testing the standard H0: WMWprob = 0.50, the p-value for the tailored H0: WMWprob ≤ 0.55 is 0.28.

   Ex3b <- WMW(Y=PAVchange, Group=treatment,
              GroupLevel=c("Placebo", "SuperHDL"),
              Ncats=5,
              Alpha=c(0.045, 0.005),
              WMWprob0=0.55)

*****************************************************************
Parameter Estimate      0.95 CI*     One-Sided Hypothesis   P**
.................................................................
WMWprob     0.612    [0.430, 0.825] H0: WMWprob <= 0.55 0.282
WMWodds     1.580    [0.755, 4.709] H0: WMWodds <= 1.22 0.282
*****************************************************************
*CI error rates (alphaL, alphaU): (0.0450, 0.0050)
CI Method: coupling Sen (1967) & Mee (1990)
**Normal(0, 1) test statistic, Z = 0.578
P-value for H0: WMWprob >= 0.55: 1 - 0.282 = 0.718
Two-sided p-value (H0: WMWprob = 0.55): 0.563

The March of Science

What if Dr. Nissen's protocol had set the above WMWprob analyses to be the primary assessment of treatment efficacy? Would ETC-216 have been touted as a "breakthrough?" While we do not have the actual data, it is unlikely that the study would have generated such fanfare. On the other hand, should these "statistically non-significant" findings be seen as refuting ETC-216 efficacy? No, also. The WMWprob estimate of 0.61 supports efficacy, but the sample sizes are so small and so imbalanced (36 and 11) that the resulting CIs will be too wide and the p-value powers too low to "confirm" much of anything.

In almost any study, its March of Science" progresses at varying speeds with wrong paths taken, some requiring backtracking. This March to infer that ETC-216 is efficacious began with Dr. Nissen being quite skeptical, but "one-in-10,000" is surely hyperbole. Instead, let's say the March began at Prob[ETC-216 is efficacious] = 0.01. Then, after obtaining the mildly encouraging WMWprob results shown above, shouldn't that skepticism be reduced, even a little? Example 5 we propose a Bayesian method to calculate where the march has moved to, forward or backward, after a WMWprob analysis has been completed. To be specific, Prob[ETC-216 is efficacious | WMWprob analysis Study 1] = 0.0XX.

Example 5 goes on to illustrate two different paths taken after Study 1. The "Pfizzle March" depicts one that fails to progress forward. After conducting Study 2: Prob[ETC-216 is efficacious | WMWprob analyses Studies 1 & 2] = 0.0XX. So, that march goes 0.0XX→ 0.0XX → 0.0XX, and then ends in futility. On the other hand, the "Victory March" makes solid progress due to Study 2, Prob[ETC-216 is efficacious | WMWprob analysis Study 1 & 2] = 0.XX,and then with Study 3, Prob[ETC-216 is efficacious | WMWprob analysis Study 1 & 2 & 3] = 0.XX. In short, 0.0XX → 0.XX → 0.XX.

Are you intrigued by this approach? See Example 5.