14 Comments
User's avatar
Tom Dietterich's avatar

I don't take my own advice, but my advice is to encapsulate the entire analysis chain in a docker container so that you have the right versions of all libraries, etc. It's obviously not a complete solution, because the container might run differently on different hardware.

Expand full comment
Yaroslav Bulatov's avatar

This issue reminds of the connection between robustness and generalization in ML https://jmlr.csail.mit.edu/papers/v2/bousquet02a.html . If your trained model performs well on train set and you end up with nearly the same model after introducing perturbations to the train set and retraining, then this model will perform well on test set. To stretch the analogy, perhaps the "trained model" can be viewed as the derived theory and different groups replicating the study give perturbations of the training set

Expand full comment
Ben Recht's avatar

It's different in the sense that stability theory requires the future data to be identically distributed to the training data. Most of the interesting questions in replication are about what happens when the "distributions" are different, if you will. Under what contexts is the result replicable?

For what it's worth, this is an important problem in machine learning too! We have no idea how to predict/ensure that a model will work in new contexts.

Expand full comment
Yaroslav Bulatov's avatar

IID assumption puts a bound on the distance between true distribution and empirically observed one.... wondering if one can do away with IID assumption and reason in terms of distances directly

Expand full comment
Kevin M's avatar

I feel like reproducibility is moreso an issue not because their model doesn't work but rather whether or not they are allowed to share the data with the public (especially if the data is subject to privacy laws).

Expand full comment
Ben Recht's avatar

1. People still complain even without privacy rules.

2. I've written before that privacy is usually a convenient excuse to not release data, though not really particularly well ethically articulated.

Expand full comment
Maxim Raginsky's avatar

I think this goes hand in hand with your earlier point about including the specifics of software into the auxiliary theories in Lakatosian derivation chains.

Expand full comment
Ben Recht's avatar

Absolutely. I'm stuck on the irony of how software, which I was taught is all about abstraction layers, melts all of the abstraction layers in the derivation chains and technical presentation. Everything is revealed as language in a muddy postmodern puzzle.

Stupid Information Age.

Expand full comment
Maxim Raginsky's avatar

This is precisely the point Brian Cantwell Smith was hammering on: While the theory of computation urges us to think of computation as formal symbol manipulation that does not get infected by the meanings of the symbols, this is not at all what "computation in the wild" is. Meaning infects everything.

Expand full comment
Ben Recht's avatar

Shit. Another Raginsky recommendation now put on my queue. Too many books, man.

And, as Kripke shows, even the formal language can't escape the meaning problem.

Everything just becomes translations between symbols, whether those symbols are written on paper or stored in a capacitor.

(I try to keep my postmodern bullshit to a minimum in the posts, but the comments should remain a pomo safe space)

Expand full comment
Kevin Munger's avatar

The question that this topic always raises, to me:

How *long* does the result need to be reproducible for?

And, accepting your argument that reproducibility is about communication -- does a video of the code running accomplish this? It allows others to observe what you did, but not to intervene. But once we start intervening, we're back in the land of replication, and the sanctity of the original implementation becomes irrelevant.

But then...for a lab science, say...what if we took a video of the scientists performing the procedure? You're pushing the reproduce/replicate margin in the direction of more replication by pointing to the fallability even of computers...but contemporary "science reform" it seems is pushing on the opposite margin, trying to turn human scientists into robots

Expand full comment
Ben Recht's avatar

For the purposes of my position, yes a video of someone clicking through a python notebook with every frame revealing the contents of a subcalculation would count as reproducible. But it would be so painful to produce and consume such an artifact.

As to your question about time, man I don't know. It's weird because in what are ostensibly my fields (machine learning, optimization, signal processing), you definitely get at least a year. So I don't want to quibble. But as a metatheoretician, I like thinking about the question.

I do agree with you that as soon as you intervene on the code, you move to replication.

What I'm trying to call for, and perhaps this is misguided, is to move communication conventions in the sciences. It's more important to enforce open data and code than proper application of the t-test. Journals should enforce such reproduction standards. And we have to think about teaching best software practices at the undergrad level.

I would argue *against* the reforms that, as you well articulate, seem to want people to be robots. And this is why I'm leaning so hard on this replication/reproduction distinction.

Expand full comment
Kevin Munger's avatar

:) I figured we had different intuitions about “how long”…your example from 2005 Stata is only barely an exaggeration of the social science reality.

I have a paper that is currently forthcoming based on an experiment conducted in 2019.

One of the major delays in publication was the mandatory reproduction check by the journal. It took literally years to get through peer review, then there was a 6-month wait time for the reproduction check due to a backlog—by which point, there were multiple issues due to R package versioning.

Expand full comment
Ben Recht's avatar

Ack, that's fair. My bad. And that is annoying to hear about the delays due to package errors. Do you think people in the social sciences are thinking seriously about how to avoid such dependencies in reproduction checks? Docker containers are a possibility, but probably overkill.

Expand full comment