Sharing different heartbeats
On the power of large-scale randomized trials in cardiology.
I’m always looking for examples where we need statistical reasoning and significance tests to change care, and I’m surprised no one has shoved cardiology in my face before.1 Cardiologists can make a powerful case that their field has been completely revolutionized by turning to randomized controlled trials to guide their practice.
In the late 1980s, cardiologists ran massive clinical trials to improve the standard of care for heart attacks. The GISSI trial enrolled 12,000 patients and found a 2% reduction in hospital mortality when treating heart attack patients with the anti-clotting medicine streptokinase. ISIS-2 (17,000 patients!) found that adding aspirin as an additional anti-clotting agent resulted in an additional 2% reduction in mortality.
These percentages were absolute percentages. With neither treatment, 12% of the patients died within five weeks of the heart attack. With aspirin and streptokinase, that percentage dropped to 8%. For those who care about these things (like me and three other people on the internet), the p-values in these studies were all on the order of 1 in a million. The trial size here mattered a lot. Had the trials been 10 times smaller, the same event rates would not have passed a standard p<0.05 significance threshold. These trials are textbook cases for RCT gold standard evangelists.
Moreover, the megatrial culture in cardiology has famous examples of rooting out harmful practices. The most famous study is the CAST trial of the early 1990s. Standard practice had been to pharmacologically suppress arrhythmias after heart attacks. Using drugs to make the heart look more “normal” seemed like a good idea. The CAST trialists enrolled 1,500 patients and shockingly found significant harm. 5% of patients died within 10 months on the anti-arrhythmia treatment, whereas only 2.5% died on placebo. The confidence intervals were narrow, and the p-value was again tiny. Something that felt reasonable—suppressing unusual heartbeats—was deemed harmful, and the practice was ended.
What can be said of the net benefits of these treatments after 40 years? Though robust, the effect sizes in these trials are all small, ranging from 1% to 3%. How can we be sure that the effects are cumulative and that modifying the standard of care is actually helping cardiology patients?
Regrettably, we now have to turn to epidemiology. No IRB will authorize an RCT of the old methods—essentially bed rest and oxygen—against the current therapeutic regimen of angioplasty, clot reducers, blood thinners, beta blockers, and statins. That would be akin to doing an RCT with a control group assigned to blood letting. But the improvement in survival of heart attacks is undeniable. Estimates suggest that the current death rates have dropped from somewhere in the range of 15-20% to about 4-5%. That’s quite astounding. 50 years of improving practice have accumulated a 3-6 fold reduction in deaths. You couldn’t ask for something better. Cardiology is a compelling and fascinating case study of the power of outcome optimization.
Why was there so much success in cardiology? You could argue that heart attack is a nearly ideal case for this sort of trial-based optimization. The endpoint of “death” is the most unambiguous in medicine. The adverse endpoint occurs fairly quickly (within weeks), as opposed to, say, oncology, where treatments can take years to assess. Moreover, heart attacks are unfortunately very common. The silver lining of their commonality is large pragmatic trials are relatively easy to assemble. Large trials are essential when the effect sizes are only 1-2%.
Now, even though cardiology is a poster child for evidence-based medicine, it’s important to note how the actual advancement of practice was not simply by chaining together a sequence of massive RCTs. First, not every trial was as unambiguous as GISSI, ISIS-2, and CAST. Two trials in the 1990s assessed the relative value of streptokinase and tPA, two anticoagulant agents. The GUSTO trial enrolled 41,000 patients and found tPA reduced death by 1%. The GISSI-2 trial enrolled 12,500 patients and found no difference between tPA and streptokinase. They contradicted each other! Post-trial analyses concluded that the trials had administered tPA differently, and this explained why GUSTO found a benefit. It was a careful study of the trial after the fact that suggested the true benefit of the drug. This analysis required an appeal to what pharmacological knowledge, and wasn’t simply adjudicated by randomized experimentation. Moreover, a second post-trial analysis concluded that tPA increased the risk of stroke. The story is complicated. Mega-trials alone can’t solve practice.
Even the unambiguous trials were already pointing to a major challenge with RCTs. As a treatment regimen becomes more complex, ironing out the fine details requires an exponentially increasing number of RCTs. If you want to compare the effect of three different timings and three different dosages of a single drug, you need nine arms in your trial. If you want to additionally see if a second drug is helpful, you need 18.
Finally, not every guideline of practice comes from randomized trials. Fewer than 10% of the American College of Cardiology/American Heart Association clinical guidelines are backed by large RCTs. I was surprised to learn that there is no convincing RCT showing that bed rest is harmful. We have ended the practice of long-term bed rest regardless. So how cardiologists make recommendations to their patients remains complicated. How does this sort of therapy relate to individual doctor-patient experiences? That’s the next question I’m hoping to answer.
Thanks to cardiologist Guy Armstrong, whose comments inspired me to put this post together.

