12 Comments
User's avatar
David Hilbert's avatar

In addition to the precisely controlled interventions that are characteristic of psychophysical experimentation there are two other factors that contribute to the robustness and replicability of psychophysical experiments. Unlike in many other areas of psychology (and medicine), psychophysical experiments typically use a within subject design (comparing the responses of the same subject to two different interventions) which gets rid of the many complications involved in comparing across subjects. Related to this is that most psychophysical tasks can be done quickly and repeatedly by experimental subjects. That means it's possible to record hundreds or sometimes thousands of data points on an individual subject. It also means you don't need to recruit a lot of subjects. There was a common joke current in color science when I was first trying to master the basics of the discipline. A psychophysical experiment needs three subjects: the two authors plus the naive subject. This wasn't literally true but it did capture an important aspect of the literature back in the 1970s and 1980s. Thanks for an interesting post.

Ben Recht's avatar

Great points.

It's also funny because standard p-values don't apply to within-subject designs, since the SUTVA assumption doesn't hold.

Hilbert Spaess's avatar

Psychophysics may be more robust than medicine, but as a former psychophysicist you don't want to try getting a job in the former!

RTG's avatar

What is an example of a psychophysical intervention as it compares to soft psychology? Curious what use cases could take the advantages and harness them for human benefit. Great read!

Hilbert Spaess's avatar

Human visual processing is most robust within a small region in the centre of our vision - the fovea. This area is about a degree in diameter, or 1cm at ~57cm distance from the eye. It's common practice in virtual reality displays to track the eye movements of the user and render the image outside of the fovea with less resolution to save computational effort. There is some lovely work looking at the sort of statistical approximations you can get away with presenting peripherally while the viewer is none the wiser. As for a soft psychological "intervention", here's a representative example. https://en.wikipedia.org/wiki/Power_posing

FourierBot's avatar

Can’t believe this is free. Thanks for this cool blog!

Mark Johnson's avatar

When I read your blog generally I understand what you're getting at almost immediately (and usually agree). In this series of posts you seem to be worried that statistics isn't a path to truth. As you have explained many times in earlier posts, often statistics is just a bureaucratic convention; a publication hurdle.

But bureaucracies, conventions and hurdles are not always useless even if imperfect. The rules of the road are just conventions, which are sometimes inefficient (e.g., I have to wait at a red light even if there's no traffic in the orthogonal direction) and certainly don't prevent all traffic accidents, but I'm pleased we have them. Likewise, while the p-value requirements are imperfect, I suspect we'd be overwhelmed with even worse papers if we didn't have them. While I expect we could improve both our statistical conventions and our traffic rules, actually doing both could be tricky.

I also see you hinting at the fact that statistical rigour has little to do with understanding something at a theoretical or pre-theoretical level. Control theory, causal models, and some economic modelling try to build models that capture aspects of the underlying phenomena. But as you've remarked in your posts, unstructured machine learning in the form of GenAI is where all the money is today.

The fields of psychology, psycho-linguistics and linguistics overlap a lot in terms of the phenomena they cover. Linguistics - perhaps because of Chomsky's influence - is very theoretical and famously numerophobic, while psychology tends to be very atheoretical and rely on statistical methods. I think linguistics has discovered many interesting facts about human language, but without any statistical information it is hard to tell if any specific claim is reliable.

Ben Recht's avatar

The problem with statistics is nothing squares. If you try to look at statistical thinking holistically, it is indeed a thorny mess of contradictions that everyone finds confusing. At least to me, statistics only makes sense locally. And even then, you have to be super clear about the rules.

With those caveats, narrow, dappled applications of statistics make a ton of sense and are super powerful.

Do you have a regulatory bar that can benefit from some blind experimentation across diverse cases. Sure, try an RCT. Doesn't solve everything, but solves some stuff.

Should RCTs be used as part of a system of scientific validity? The jury really isn't out any more: it's just ritual and hasn't advanced anything. (That's what the last week has been reiterating.)

Can you use statistical pattern matching to fit functions with sufficiently large data sets? Sure! Can you use data-driven modeling on its own in mission-critical control applications? Maybe not.

So for your last point: "without any statistical information, it is hard to tell if any specific claim is reliable." I don't believe that holds globally. But I'm sure you can show me regions of linguistics where it holds locally.

Mark Johnson's avatar

Apologies if I was unclear; what I meant was that sometimes rudimentary statistical information would help clarify just how well justified an empirical claim is.

For example, consider the sentences (1) and (2):

(1) Sam promised Sasha to wash himself.

(2) Sam persuaded Sasha to wash himself.

The claim (which I think is largely true) is that in (1) "himself" is most likely to be interpreted as referring to "Sam", while in (2) "himself" is most likely to be interpreted as referring to "Sasha". There are several competing linguistic theories explaining why this occurs, and the field of linguistics largely proceeds by trying to find examples where these theories make different predictions; these examples would then support or contradict a particular linguistic theory. I think this is all well and good, and I expect you would agree.

Usually the empirical claims are just presented as I did above, with no indication of how broadly they hold, or where the examples come from (were they found in a corpus of texts, or did the author just make them up as I did?). I agree with you that demanding 5-sigma evidence for the judgements probably wouldn't help the field, but knowing more about the empirical data would be useful.

Maxim Raginsky's avatar

Samples represented in two's complement binary.

Ben Recht's avatar

eight to fourteen expansion