This post digs into Lecture 3 of Paul Meehl’s course “Philosophical Psychology.” You can watch the video here. Here’s the full table of contents of my blogging through the class.
Experimenter biases about their subject matter are the clearest example where the context of discovery bleeds into the context of justification. Though supposedly uncivil and fallacious, ad hominem arguments are essential for evaluating scientific evidence.
In a delightful, illustrative anecdote, Meehl recounts the long history of rat studies designed to test the existence of “latent learning.” These intricate experiments involved running mice through mazes and seeing if they learned things about the maze when they were not offered food. A group of researchers on the West Coast led by Edward C. Tolman at Berkeley strongly believed in latent learning. Remarkably, their publications consistently reported compelling experimental evidence of rats exhibiting latent learning in mazes. On the other hand, researchers at Iowa were decidedly against the concept and thought reinforcement was necessary for all learning. I know you will be shocked to hear it, but the Iowa psychologists never found evidence of latent learning.
You might think that these two competing research communities were deliberately malicious and faking their data, but the weird part was that less ideological labs had mixed results. For example, Meehl and his collaborators at Minnesota, who claimed to have no dog in the fight, sometimes found latent learning and sometimes didn’t.
What would explain the mixed results? Meehl noted from his own experience that how you handled the rats dramatically affected the experimental outcomes. The Iowa team just threw their rats in the maze. The Berkeley folks coddled the rats and did clinical studies of rat psychology. Meehl and others observed large changes in experimental outcomes depending on the rat’s disposition. If you grabbed a rat in your hand, it would get anxious and rigid in the apparatus. If you gently let the rat sit on your arm, perhaps petting it as you gently and offering it a snack, it would happily explore the maze.
These sorts of subtle experimental differences don’t and can’t make it into published reports. Even if we allow ourselves unbounded appendices, it’s impossible to write down all of the nuanced experimental details of experiments. Though we’d love to believe otherwise, physics experiments are not necessarily better. Eddington was committed to finding evidence of relativity, and his commitment influenced his data analysis of the different astrographic plates.
All of this just means we must account for the disposition of the experimenter when evaluating their reports. A paper is always a partial view of the experiment, and the past history of the research group, the way they talk about their work, and our knowledge of their techniques should bias our interpretation of the published results. Some people sell harder than others. Some people’s experiments are easy to replicate, some much harder. Labs and scientists have tendencies, and these patterns are part of the experimental setup and affect the downstream outcome.
Let me give a more contemporary example. There is a particular doctor who is constantly interviewed on the news and social media for his work on Long Covid. He purports the disease will lead to the mass disabling of humankind. But this doctor uses a single data set from the Veteran’s Association that only his team can access. He only uses the most questionable of observational experiment designs as evidence. And his papers manage to find harmful effects of COVID far outside the range that others observe. Are we supposed to take this person’s results at face value?
Meehl frames it as a psychologist:
“Those subtle aspects and sometimes not so subtle aspects of somebody's psychology the extent to which he's passionate about the theory the extent to which he's got a vendetta going against Dr. Glotts. All sorts of mechanisms of defense reaction, formations, and projections and so forth. To the extent that those things can influence what happens in the lab in a way that you reading the the published report cannot always discern makes some amount of sociology and psychology relevant, even when you’re in the context of justification rather than in the context of discovery.”
You have to be careful with these sorts of armchair psychological evaluations, of course. But how careful is impossible to say.
Yet there is a constructive path forward here, and it’s an obvious one. First, I hope it’s clear that you can’t fix these issues with better statistical methods. You can’t build a precise conjugate prior for researcher tendencies and then Bayesian update of your belief of a theory. Scrutinizing p-values or q-values or e-values or whatever can’t tell you about the subtle missing data. Making people run through a 32-item checklist of scientific rigor also can’t remove the subtle biases that creep into an experimental pipeline. I’ll return to why statistics is ill-suited to this problem when blogging about a future lecture.
What is the fix then? We need to account for researcher motivations because we know we are missing data, whether they concern the experimental apparatus, the data cleaning, the lab conditions, or the weather. Though we can’t record everything about our experiments, we can be diligent about storing, documenting, and sharing data and code. Well-documented code, data, and even lab notebooks are now trivial to maintain. You can record more of the experimental process than ever before.
I’m not saying data and code sharing solve all of our problems; Dayton Miller was meticulous with his experimental records, and it still took physicists 30 years to explain his results away. But it is better to have data out in the open for everyone to pour through and debate than to leave them missing and encourage psychological speculation.
I wrote about this on Twitter, but I recently read a paper with a result that was far too good to be true, and it had these two excerpts at the end.
Such quotes are attached to every clinical trial report I read. And I consequently set my prior to believe all clinical trial reports are wrong (bUt soMe ArE uSEfuL!). I have a long list of randomized clinical trials that look preposterous on their face, don’t share their data, and don’t seem to replicate. I understand that medicine has to balance privacy concerns against scientific findings in their experimentation. But the closed data model leaves us having to overweigh the psychology of the research team behind the studies. And this casts the entire research enterprise in doubt.
The latent learning debate is particularly interesting for this series!
As It turns out, the "common wisdom" version of the history as being taught to most psychology undergrads is somewhat apocryphal. And in fact this is [another] example that many such big scientific debates in psychology & neuroscience are/were never "settled" -- people just stopped caring and moved on to the next question, or next methodology, or whatever.
Here's an excerpt from an interesting historical/sociological review (https://link.springer.com/article/10.1007/BF03392130; which also refers to some of Meehl's works on the topic):
> The ensuing debate was propelled by the experimental work of the preeminent learning psychologists of the time, and the debate lasted for 30 years.
> Experimenters and their doctoral students from each side of the debate devised increasingly sophisticated research to answer the theoretical questions that arose from each previous generation > of experiments.
> Yet notwithstanding the experimental, methodological, and theoretical creativity of many of the key figures in the history of psychology, by the mid-1960s many psychologists considered the matter of latent learning to be dead (Goldstein, Krantz, & Rains, 1965).
> However, the end of an era of inspired research productivity and sharp theoretical debate had come to an end not because one theory prevailed over all others.
> Rather, as Thistlethwaite’s (1951) review of 30 years of latent learning amply demonstrated, it was because the issues that arose out of the extensive latent learning experimentation remained unsettled, and no resolution was thought to be forthcoming.
> Thus, what had begun as a lively, empirically based debate over fundamental issues in learning ultimately ended in a stalemate.
[I wanted to write a post on that for some time, but I need to find a way to re-start my dead substack...]
I don't know whether this also jumped out at you, but there is a bit of a conflict between Meehl's discussion of the effect the handling of rats had on the experimental outcome and his view that the interaction between the measurement device and the thing being measured does not matter as much in psychology as it does in physics.