Sharing different heartbeats
On the power of large-scale randomized trials in cardiology.
I’m always looking for examples where we need statistical reasoning and significance tests to change care, and I’m surprised no one has shoved cardiology in my face before.1 Cardiologists can make a powerful case that their field has been completely revolutionized by turning to randomized controlled trials to guide their practice.
In the late 1980s, cardiologists ran massive clinical trials to improve the standard of care for heart attacks. The GISSI trial enrolled 12,000 patients and found a 2% reduction in hospital mortality when treating heart attack patients with the anti-clotting medicine streptokinase. ISIS-2 (17,000 patients!) found that adding aspirin as an additional anti-clotting agent resulted in an additional 2% reduction in mortality.
These percentages were absolute percentages. With neither treatment, 12% of the patients died within five weeks of the heart attack. With aspirin and streptokinase, that percentage dropped to 8%. For those who care about these things (like me and three other people on the internet), the p-values in these studies were all on the order of 1 in a million. The trial size here mattered a lot. Had the trials been 10 times smaller, the same event rates would not have passed a standard p<0.05 significance threshold. These trials are textbook cases for RCT gold standard evangelists.
Moreover, the megatrial culture in cardiology has famous examples of rooting out harmful practices. The most famous study is the CAST trial of the early 1990s. Standard practice had been to pharmacologically suppress arrhythmias after heart attacks. Using drugs to make the heart look more “normal” seemed like a good idea. The CAST trialists enrolled 1,500 patients and shockingly found significant harm. 5% of patients died within 10 months on the anti-arrhythmia treatment, whereas only 2.5% died on placebo. The confidence intervals were narrow, and the p-value was again tiny. Something that felt reasonable—suppressing unusual heartbeats—was deemed harmful, and the practice was ended.
What can be said of the net benefits of these treatments after 40 years? Though robust, the effect sizes in these trials are all small, ranging from 1% to 3%. How can we be sure that the effects are cumulative and that modifying the standard of care is actually helping cardiology patients?
Regrettably, we now have to turn to epidemiology. No IRB will authorize an RCT of the old methods—essentially bed rest and oxygen—against the current therapeutic regimen of angioplasty, clot reducers, blood thinners, beta blockers, and statins. That would be akin to doing an RCT with a control group assigned to blood letting. But the improvement in survival of heart attacks is undeniable. Estimates suggest that the current death rates have dropped from somewhere in the range of 15-20% to about 4-5%. That’s quite astounding. 50 years of improving practice have accumulated a 3-6 fold reduction in deaths. You couldn’t ask for something better. Cardiology is a compelling and fascinating case study of the power of outcome optimization.
Why was there so much success in cardiology? You could argue that heart attack is a nearly ideal case for this sort of trial-based optimization. The endpoint of “death” is the most unambiguous in medicine. The adverse endpoint occurs fairly quickly (within weeks), as opposed to, say, oncology, where treatments can take years to assess. Moreover, heart attacks are unfortunately very common. The silver lining of their commonality is large pragmatic trials are relatively easy to assemble. Large trials are essential when the effect sizes are only 1-2%.
Now, even though cardiology is a poster child for evidence-based medicine, it’s important to note how the actual advancement of practice was not simply by chaining together a sequence of massive RCTs. First, not every trial was as unambiguous as GISSI, ISIS-2, and CAST. Two trials in the 1990s assessed the relative value of streptokinase and tPA, two anticoagulant agents. The GUSTO trial enrolled 41,000 patients and found tPA reduced death by 1%. The GISSI-2 trial enrolled 12,500 patients and found no difference between tPA and streptokinase. They contradicted each other! Post-trial analyses concluded that the trials had administered tPA differently, and this explained why GUSTO found a benefit. It was a careful study of the trial after the fact that suggested the true benefit of the drug. This analysis required an appeal to what pharmacological knowledge, and wasn’t simply adjudicated by randomized experimentation. Moreover, a second post-trial analysis concluded that tPA increased the risk of stroke. The story is complicated. Mega-trials alone can’t solve practice.
Even the unambiguous trials were already pointing to a major challenge with RCTs. As a treatment regimen becomes more complex, ironing out the fine details requires an exponentially increasing number of RCTs. If you want to compare the effect of three different timings and three different dosages of a single drug, you need nine arms in your trial. If you want to additionally see if a second drug is helpful, you need 18.
Finally, not every guideline of practice comes from randomized trials. Fewer than 10% of the American College of Cardiology/American Heart Association clinical guidelines are backed by large RCTs. I was surprised to learn that there is no convincing RCT showing that bed rest is harmful. We have ended the practice of long-term bed rest regardless. So how cardiologists make recommendations to their patients remains complicated. How does this sort of therapy relate to individual doctor-patient experiences? That’s the next question I’m hoping to answer.
Thanks to cardiologist Guy Armstrong, whose comments inspired me to put this post together.


[Jones 2000](https://pubmed.ncbi.nlm.nih.gov/11143786/) traces an interesting history of cardiology as it hosted one of the decisive contests between 'evidence based' and 'clinical' medicine during the adoption of CABG surgery. EBM proponents argued for the primacy of _evidence from RCTs_ of 3-year survival, while clinical proponents got behind _visual evidence of mechanistic effects_ from angiography with the logic:
blood flow = health
no blood flow = disease
restored blood flow = cure.
The RCT is interpretable only as evidence about average outcomes (would the study subjects have benefited on average if everyone were treated), while the mechanism is evidence of success in individual cases (was *this* patient's blood-flow restored by the procedure). In the care of a specific individual, it's not clear that survival should be the primary target, since quality-of-life could matter more for some patients depending on their preferences, which (unfortunately for rationalists) may be impossible to quantify/elicit. If restoring flow benefits both survival _and_ quality-of-life, and is accessible on a per-patient level, then the clinical heuristic is arguably more important than the results of the RCT. On the other hand, if restoring flow benefits neither, why does CABG work at all; it seems like any efficacy has to pass through this mechanism.
RCTs work formally, but it seems like they are frequently insufficient at producing knowledge about actions in reality. Understanding these shortcomings will help us to understand in which settings decision making can (should) be automated effectively / ethically...
All to say: we should fund the humanities.
Having the results of the randomised trials to hand, we then apply the average result to the individual patient in front of us. But couldn't that average outcome be comprised of patients who were harmed as well as helped by the treatment? Why should the average outcome - which is presumably experienced by few individual trial participants - apply to my patient? (Notwithstanding that my patient's characteristics differ in known & unknown ways from the trial participants, but I see that as a different issue).
While we're at it, what about hearing that your coronary stenting procedure has a 1:1,000 chance of a serious complication - how does the certainty-craving human brain use that information - that you're unlikely to have a complication but if you do it'll be life-changing - to inform the dichotomous decision as to whether to proceed with treatment?
The broader conundrum is, how can the human brain best use statistics and probabilities to inform individual decisions?