Discussion about this post

User's avatar
John Quiggin's avatar

Just expressing my confirmation bias on two points

* My immediate reaction to LLMs was: this makes mediocrity easy. That is, for tasks where I am expert, such as writing an article, the LLM did nothing. For tasks where my expertise was minimal, like writing Python code, it was great. So, I believe the result

* If you only have 16 subjects, you'd better have results so clear that no statistical analysis is needed to convince the readers. For example: all 8 jumpers who used a parachute survived, 0 without a parachute did. Once error bars are relevant, I'm in the sceptical camp.

Expand full comment
Noah Haskell's avatar

Not to horrible standard errors guy the horrible standard errors guy, but...

> The ideal experiment satisfies SUTVA, the Stable Unit Treatment Value Assumption, which asserts that the only thing that affects a measured outcome is the assignment to the treatment group or the control group.

I think it's more accurate to say that SUTVA asserts that the potential outcomes for unit i are not affected by unit j's assignment to treatment or control. It sounds like the problems with this study are violations of excludability more than SUTVA (or maybe both are violated).

For example, if the developers' completion times increase over time as they get more tired, this is a violation of excludability. By allowing the developers to choose the order of the tasks, factors other than the treatment itself can cause differences in completion times.

On the other hand, if the condition assignment of Task A affects the completion time of Task B (or vice versa), this would be a violation of SUTVA. It's easy to imagine this happening here, as well. For example, suppose there is a lot of overlap in the skills and knowledge required by these two tasks. If the first of the two is done with an LLM helper, this could give the developer some useful reminders and context for the second task, which might then be completed more quickly than it would have been otherwise. Or vice versa - if the developer does the first without the LLM, maybe it's distracting to get a bunch of unnecessary code-completion suggestions.

Expand full comment
5 more comments...

No posts