Another tangent, but this reminded me that a while back there was a big twitter spat between Judea Pearl and "trialists". A misconception I realized I had, that perhaps should have been obvious to me, is that in an RCT the treatment assignment is randomized, but the sample of participants is (almost always) not. So, when interpreting the results of the trial, the estimates of, e.g., the ARR are specific to the sample of participants in the trial. Given what little I know about trial enrollment, it doesn’t seem like we should have much confidence in the generalizability of such results to a new population (I would be happy to be wrong here!).

Further, RCT statisticians have developed their own language of complicated tools (insert your favorite combination of “cluster”, “block”, and “crossover” before “trial design”) for experiment design. It seems that the magic—the intellectual achievement here—is in being able to estimate the treatment effect of a sample despite missing counterfactual outcomes. And while this is certainly impressive, it doesn’t seem to help one answer questions like “will this treatment work on my patient? Will this treatment work on me?”. How does generalizability/transportability of effect estimates come into play here? This, to me, seems like a very important piece of the problem.

Yes, well put. The details of the allocation and selection of participants has all sorts of DOFs, which determines what kind of average treatment effect (ATE) is actually being measured, which may be a one-off opportunity (non-repeatable environment) for a study anyway. But I really like the emphasis of RCTs as a type of measurement here.

Yes, I agree with both of you. RCTs are not a panacea, and they solve a very specific set of problems. This is why I like thinking of them as a particular measurement device. By strained metaphor, a tape measure is important to have in your toolbox, but not only are the screwdrivers and wrenches important, the tape measure is not even the only measurement device in there.

I am also glad you are asking “will this treatment work on my patient? Will this treatment work on me?” are questions I want to blog about here. I will definitely come back to this as these are the questions that keep me up at night.

Indeed, I started getting into applied statistics after being spooked by the fragility of generalization in machine learning. I thought there might be some resolutions there. But there were no fixes to be found in statistics. My conclusion is that statistics can't really say much of anything about generalizabilty or transportability. I am going to dig into this more in future posts.

This made me think "Of course, within the confines of reality, we are not telepathic. We can only observe one of these outcomes per column." Here Treatment is a bit. What if we make it a qubit so we can place units in superposition of |Treatment = Y> and |Treatment = N>? Any situation where a quantum RCT could have an edge over a classical one?

Another tangent, but this reminded me that a while back there was a big twitter spat between Judea Pearl and "trialists". A misconception I realized I had, that perhaps should have been obvious to me, is that in an RCT the treatment assignment is randomized, but the sample of participants is (almost always) not. So, when interpreting the results of the trial, the estimates of, e.g., the ARR are specific to the sample of participants in the trial. Given what little I know about trial enrollment, it doesn’t seem like we should have much confidence in the generalizability of such results to a new population (I would be happy to be wrong here!).

Further, RCT statisticians have developed their own language of complicated tools (insert your favorite combination of “cluster”, “block”, and “crossover” before “trial design”) for experiment design. It seems that the magic—the intellectual achievement here—is in being able to estimate the treatment effect of a sample despite missing counterfactual outcomes. And while this is certainly impressive, it doesn’t seem to help one answer questions like “will this treatment work on my patient? Will this treatment work on me?”. How does generalizability/transportability of effect estimates come into play here? This, to me, seems like a very important piece of the problem.

Yes, well put. The details of the allocation and selection of participants has all sorts of DOFs, which determines what kind of average treatment effect (ATE) is actually being measured, which may be a one-off opportunity (non-repeatable environment) for a study anyway. But I really like the emphasis of RCTs as a type of measurement here.

Yes, I agree with both of you. RCTs are not a panacea, and they solve a very specific set of problems. This is why I like thinking of them as a particular measurement device. By strained metaphor, a tape measure is important to have in your toolbox, but not only are the screwdrivers and wrenches important, the tape measure is not even the only measurement device in there.

I am also glad you are asking “will this treatment work on my patient? Will this treatment work on me?” are questions I want to blog about here. I will definitely come back to this as these are the questions that keep me up at night.

Indeed, I started getting into applied statistics after being spooked by the fragility of generalization in machine learning. I thought there might be some resolutions there. But there were no fixes to be found in statistics. My conclusion is that statistics can't really say much of anything about generalizabilty or transportability. I am going to dig into this more in future posts.

This made me think "Of course, within the confines of reality, we are not telepathic. We can only observe one of these outcomes per column." Here Treatment is a bit. What if we make it a qubit so we can place units in superposition of |Treatment = Y> and |Treatment = N>? Any situation where a quantum RCT could have an edge over a classical one?