Discussion about this post

User's avatar
Nate Rush's avatar

Hey Ben, one of the study authors here. Thanks for the detailed feedback and review of the appendix -- I really appreciate when folks think through our work in detail + give us feedback as well!

> METR, if you are reading this, please do it.

Yes, we're planning on open-sourcing pretty much the exact anonymized data that you ask for (as well as some core analysis code, in case you want to dig in on that as well!). Expect code and data in this repo (https://github.com/METR/Measuring-Early-2025-AI-on-Exp-OSS-Devs) towards the EOW.

> Task completion order can affect the amount of time it takes you to complete the task.

Totally agree. We call this out in the factors table (page 11) as one factor with an unclear effect -- see "Bias from issue completion order (C.2.4)"! As we explore in the appendix, we have no evidence this occurred, but neither do we have evidence it didn't.

> What we can glean from this study is that even expert developers aren’t great at predicting how long tasks will take. And despite the new coding tools being incredibly useful, people are certainly far too optimistic about the dramatic gains in productivity they will bring.

I think these takeaways seem pretty reasonable to me (noting of course that we do not provide any evidence about all developers -- just expert developers in the specific setting that we study).

Thanks again for the feedback here. If you have any more once you get access to the data / code, I'd be excited to hear it.

Expand full comment
Nadia Eldeib's avatar

This article was one of the clearest analyses I've read of the METR study; I found it really helpful overall to understand the merits and (mostly) the challenges with the study format.

Expand full comment
10 more comments...

No posts