Discussion about this post

User's avatar
Chris's avatar

I wouldn't say that reproducibility is as trivial as you say. The whole nix ecosystem was created because Eelco Dolstra's thesis[1] showed that even to reproduce software reliably, which is to say nothing of the data, requires cryptographic naming conventions and a functionally pure (not side-effecting, compositional) build system, which is extremely hard to do. I've talked to some highly accomplished people who say that the idea of nix is beautiful and pure, but not workable in practice. As far as I understand it, it's the reason why Docker is the preferred model over nix.

For a more timely and relevant example, there's huggingface's model reproduction pipeline - they just keep a frozen python script that builds and trains the model. Bugs are not allowed to be fixed, and it relies on everything that was available at time t0 always being available under the same name, which is not always so. No matter how hard you try, someone will fix some problem, and replace the old broken thing with some new thing of the same name. That's often a good idea, but it flies in the face of any claim that reproducibility is trivial.

And then there's data... In theory, data is just as hard or just as easy to replicate as software - they're both just digital artifacts. But. You'll never have to worry about HIPAA or PII or deanonymization problems with pure software, but those can be problems with data.

That said, I think the talk of reproducibility being trivial is a distraction from the more interesting point of this article, which really only gets one paragraph - that failure to replicate is interesting in itself. I was hoping for a whole article on that topic, which I think you would have had a lot more interesting thoughts on, that I wanted to hear!

[1] https://www.semanticscholar.org/paper/The-purely-functional-software-deployment-model-Dolstra/7c9d53d567c4db2034d8019ff11e0eb623fe2142

See also "Problems With Existing Solutions" at

https://jonathanlorimer.dev/posts/nix-thesis.html

Expand full comment
rvenkat's avatar

The link to Nosek et al is broken. This one (https://www.annualreviews.org/content/journals/10.1146/annurev-psych-020821-114157) takes us to the correct page.

Expand full comment
4 more comments...

No posts