Discussion about this post

User's avatar
Ani N's avatar

I think that people are dealing with these problems in practice:

- Conda envs / Docker containers with the exact recipe used to create the software environment

- Standard repos / codebases for baselines / evaluations, to make sure implementation bugs don't contaminate results.

- Public code repos for verification and replication

Are these perfect? No! But I think that with all of these tools, verification & replication of (non expensive pretraining) papers is far cheaper / easier in AI/ML than it is in a field like biology. If there's a bug in your code, that can be detected by someone who attempts to replicate your results in a new codebase and compares it to your public github repo and conda env. In medicine, its often harder to isolate A_I, C_P, and C_N.

The fact that we still choose not to publish replications within the academic community is a larger reason for the replication crisis, as is the lack of truly held out test sets and most industry labs failing to commit to the above. But my current sense is that all the industry labs that are truly attempting to research bleeding edge models spend a lot on replication and are able to see massive returns on those efforts.

Expand full comment
John Quiggin's avatar

Part of the problem in dealing with this is a Platonic belief that philosophy/science can get us to The Truth. We are in the cave, and can't get out. But we can improve the lighting, keep careful records of the shadows, experiment with different shading patterns etc. Once you really accept fallibilism, replicability becomes a predictable problem, rather than a crisis.

Expand full comment

No posts