Debate me bro

Jan 20

Statistical inference is rhetoric.

24 Comments

The idea of statistics (and other aspects of scientific process and communication) as rhetoric in the sense of formally structured argument with intention to persuade, is one that I like as one perspective regarding the goals of inference, though not the only one. For one, some statistical procedures play a role in automated systems with no human audience at all (this is often "machine learning"), and maybe just sometimes people learn and adjust their own views in response to quantitative results (sometimes this is "Exploratory Data Analysis", though not always via the specific methods that get called that).

When I taught graduate Econometrics, I liked to devote just a tiny sliver of the first lecture to alternative interpretations, before plunging students into an otherwise standard course on probability theory and matrix algebra: see roughly minutes 12-18 of a recording https://www.youtube.com/watch?v=h727zDsAy1Q&t=1s

In it, for the idea of modeling as rhetoric, I cite particularly Deirdre McCloskey 1998 "The Rhetoric of Economics", which includes also a classic bit on p-values which I think fits quite well with the perspective you take here. (For propriety I fail to mention that I find her to be, stylistically, one of the most grating and painful to read authors in economics, though the points still stand). Looking more at the rhetoric of quantitative theoretical models, I also point to Ariel Rubinstein's excellent "Economic Fables". Rubinstein's student Ran Spiegler has a more recent book in the same vein, written in a literary fashion but drawing on what is now an active literature on models of learning from and persuading people with models. This area (by authors like Spiegler, Philipp Strack, etc) builds on earlier literature that did the same with formal decision-theoretic models of learning, but lately incorporates various kinds of behavioral features, since it is hard to explain much of scientific practice as reflecting rational learning. Of course, how convincing you find these models of being persuaded by models will depend on how persuasive you find models of that kind, so I believe there are opportunities for infinitely more layers of turtles.

Reply (1)

Ben Recht

Jan 20

Always more turtles! But we need some language of persuasion to move forward collectively, be it in research, governance, or other contexts that necessitate participatory decision making.

My one minor point of terminological disagreement is that I would not call EDA and ML statistical inference. This is me breaking with Wasserman. For, um, rhetorical purposes, I like the dappled view, keeping the many local applications of statistics separate.

Anand Sarwate

Jan 20

I'm teaching Detection and Estimation this semester (which I've taught off and on for the last 10 years). It's an EE class at heart but with all of the very dodgy statistics carted out for ML applications I feel like I've had to emphasize more over the years is that all the models we use are gross oversimplifications and ultimately an ansatz. This might be safe for, say, communications (yeah, noise isn't additive or Gaussian but the stuff we build on those assumptions works in practice), but not for most other applications.

Not sure if it gets through...

Reply (2)

Ben Recht

Jan 20

It's hard! I struggle with exactly the same problems when teaching machine learning. Shannon and Wiener cursed us with heavy stochastic baggage.

Reply (1)

Anand Sarwate

Jan 20

Shannon at least had the bandwagon paper which I trot out now as some OG curmudgeonness which was not wrong.

Cagatay Candan

Jan 23

I may speculate that the success of models in EE stems from the **linearity** of Maxwell equations. When we dig deep enough, we have Maxwell equations in the core of EE. Beyond that we have quantum effects, non-linearities and "shut up and calculate!" physics.

The linearity involved enables full fledged linear system theory to be put in action along with stationary random processes, central limit theorems and others. The power spectral density is something actually measured with the spectrum analyzers in practice.

The extreme success of models in EE is very remarkable and rare IMO. For example, this is not the case with mechanical engineering as far as I know. (Even simple friction between two opposing surfaces makes overall system non-linear. Please remember that friction is in the opposite direction to the motion of body involved, i.e., negative of xdot(t). It flips back and forth with the sign of xdot(t).)

A famous person should write smth about the surprising effectiveness of simple models in EE.

Deborah Mayo

Jan 21

I'd be glad to debate you bro...but here I speak to your readers.

When a method does genuine inferential work and yields warranted inferences, it not only is persuasive—it deserves to be. Recht’s reasoning, however, flattens this distinction, making it all too easy to dismiss such methods as merely “rhetorical.” Although his position is sufficiently equivocal (as in his bureaucratic view of statistics), the overall impression is that in his view statistical inference is little more than officially sanctioned snake oil. The fallacy here is a failure to ask why some methods deserve to be persuasive and others are do not. In the case of statistical inference, when correctly applied, the answer is straightforward: it earns its persuasiveness by exposing itself to stringent criticism and severe testing. Recht’s argument would undercut the central task of statistical science—namely, distinguishing warranted inferences and sound methods from bad but persuasive ones: those built on biased selection, missing error estimates, charming anecdotes, hasty generalization, or a convenient blindness to alternative explanations of the data.

Worse still, by portraying statistical inference as post-hoc—entering only after data collection—Recht overlooks the single most important achievement and ongoing task of error statistical inference: planning and experimental design. It is this forward-looking design perspective–advanced by Fisher and Neyman and Pearson– that makes error estimation possible and provides the basis for the critical scrutiny that moves us beyond rhetoric. Fisher was explicit about this point: his insistence on randomization, he explains, is motivated by a desire to stop 'important' critics from discounting experimental results as the result of poor controls or rhetorical tricks dressed up as science. Recht seems to be suggesting either that that’s all statistical inference is, or perhaps that’s all science is. There may be other reasons he is so keen to derogate statistical inference...

Reply (1)

Zach

Jan 21Edited

Largely agreed here, Deborah.

Ben, it's unclear to me whether you're making a descriptive claim about how statistical methods tend to be used (admittedly poorly!) or whether statistical inference "is rhetoric" in principle. ("Let’s admit that’s what statistical inference is.")

I agree that persuasion is necessary in the scientific method, but I believe rhetoric is most useful pre-data collection, pre-experiment, pre-analysis. If we can sufficiently align on a framework, modeling assumptions, critical domain expertise to incorporate into the prior, the decision process, the utility function, etc., then the inference part is mechanical. Turn the crank and make your conclusion/decision as planned. The p-value itself shouldn't be persuading anyone; it's just a number. The rhetoric enters when someone uses that number to argue for a conclusion, and that use is only rhetorical to the extent the framework (including the decision process and utility function) remains contested. If your claim is that we never achieve sufficient alignment in practice, that's an empirical claim about scientific institutions, not a claim about "what statistical inference is."

Reply (1)

Zach

Jan 21

Your Fisher examples (“You tasted 8 correctly, but the p-value of random guessing is 1/70, I remain unconvinced!”, “Here are another dozen reasons why your data associating cancer with smoking is unconvincing”) entail at least disagreements about which utility function to use if not also disagreements about modeling assumptions. The assumptions themselves seem fair game for rhetoric. In fact to me that's where the juice is.

"There are plenty of other ways that people make sense of data without statistics, even in human-facing science." You're arguing against the claim that statistics is the "[best] set of techniques to learn from numerical evidence." Do the counter-examples you link speak to a better set of techniques? Aren't the examples relying on implicit statistical reasoning? The LEAP example is compelling only if you accept that there is no confounding, that the counts surpass your threshold for "obvious", that the cost of a false-positive vs. a false-negative given the decision process is sufficiently low, that the effect straightforwardly generalizes. How is ignoring these assumptions on the way to claiming a miracle cure _at least as good as_ formalizing and defending them? None of that formalization (and any persuasion necessary to get buy-in!) constitutes rhetoric in the form of statistical inference itself.

Reply (1)

Robert Mastragostino

Jan 21

> Aren't the examples relying on implicit statistical reasoning?

To make an analogy I don't really believe: An astrologer discusses the importance of astrology in discovering truths and navigating the world. Someone points out that conversations actively involving astrology devolve quickly into squabbles about correctly modelling moon phases that never resolve, and that the cases where people reliably figure things out require no astrology at all. The astrologer replies "well aren't you relying on *implicit* astrology? You are assuming that the positioning of celestial bodies under the experimental conditions are harmonious with the signs of the patients, that solar cycles affect social rhythms to maintain this condition, blahblahblah". This isn't great!

I'm not making a direct comparison here. For one thing, the statistical critiques are all logically correct in principle. But "aren't all the successes implicitly doing what we say you should do explicitly" is a delicate argument you don't want to make if you can avoid it! If making it explicit is helpful the argument doesn't need to be made this way in the first place.

---------------

My understanding of Ben's critique is a claim that, in practice, there are two cases:

1. The meaning and structure of the experimental setup and treatment are all fairly obvious, with little practical disagreement on their reliability. Also, whether a strong effect was caused or not is obvious and needs no modelling. Error bars are for sticklers but never really affect the immediate conclusion.

2. The meaning and structure of the experimental setup, and its relationship to the estimand, are delicate and uncertain. The data is fuzzy and noisy. Unfortunately, this simultaneously produces a dramatic *modelling ambiguity*: our inability to invent a clean experiment is entangled with a lack of modelling knowledge of what the relationship between cause and effect even is, so the "real meaning" of the conclusion is model-dependent in a way that we cannot pin down. Most statistics then consists of fighting over unfalsifiable models.

In case 2, all statistical efforts are of the form "If you knew all this stuff you don't actually know, then your conclusion should be X". Using statistics on other experiments to fix this mostly makes an infinite regress. Basically if you actually *have* to calculate p-values/bayes factors/whatever, you're losing. Knowledge is found instead by exploring around for a related phenomenon that lands you in case 1 and working your way back from there. A cloud of statistics happens "around" successful efforts at knowledge-finding but is never responsible for them. Being reduced to statistics is confession that you are not involved in a "severe test" in the first place! (I actually find the form of Mayo's counter-position a bit odd, since in my mind this is the one-line summary of Ben's entire point).

-----

I don't agree with this "in the large". I mean I love statistical physics, where you can calculate the probability distributions of states in even microscopic systems, and derived the exact shape of fluctuations in detail (see experimental confirmations of e.g. the Crooks Fluctuation Theorem). I make a living doing a fair amount of experimentally-informed statistical analysis for signal processing. There *are* things in the third quadrant, where naive interpretation of data is hard but an "objectively-defined" (but still statistical) model is practically discoverable that you can wield to get nontrivial conclusions. But I don't think the comments above really help defend the claim that human-facing sciences contain these as well. For example, when you say

> The LEAP example is compelling only if you accept that there is no confounding, that the counts surpass your threshold for "obvious", that the cost of a false-positive vs. a false-negative given the decision process is sufficiently low, that the effect straightforwardly generalizes.

None of these things really have anything to do with stats. They are all obvious features of just trying to do things, that were known and discussed *well* before Bernoulli thought about gambling. This looks like it is saying "statistics incorporates basic epistemology, so anyone who cares about epistemology at all is secretly doing statistics, just poorly". Of course, statistics lets you *quantify* measures of these things, under some yet-further implicit modelling assumptions, if you want. That is its new critical contribution. Does that help? If I think these conditions are all met, and you don't, can you actually *use statistics* to convince me I'm wrong? Like, for-real make defensible calculations in terms of actually-known quantities to produce a nontrivial answer?

I mean, maybe you can! I suspect I am much less skeptical than Ben in thinking so! But that's the thing this discussion has to actually be about.

Reply (1)

Zach

Jan 21Edited

Good stuff. Thanks for your thoughts!

> [case 1 vs. case 2...] I don't agree with this "in the large".

Neither do I. I'd say the boundary between "obvious" and "delicate" depends on the community's (however defined) shared assumptions, which is my point. What looks like case 1 to one group looks like case 2 to another. The LEAP result seems obvious only if you already accept certain premises about confounding, generalization, and acceptable risk. Making those premises explicit is how you discover whether you're actually in case 1 or only think you are.

> But "aren't all the successes implicitly doing what we say you should do explicitly" is a delicate argument you don't want to make if you can avoid it!

Fair point on the argument form! Let me put it differently: I'm not claiming that successful science is "secretly statistics, just poorly." I'm claiming there's no such thing as a "success" independent of some framework of assumptions, even if that framework remains tacit. The question is whether making those assumptions explicit adds value. I think it does, not because it transforms bad epistemology into good, but because it makes disagreements legible. If you and I look at LEAP and reach different conclusions, I want to know: is it because we have different priors on confounding? Different loss functions? Different generalization targets? Quantification doesn't resolve these disputes, but it localizes them.

> None of these things really have anything to do with stats... statistics lets you *quantify* measures of these things... Does that help? If I think these conditions are all met, and you don't, can you actually *use statistics* to convince me I'm wrong?

No, I don't believe I can "use statistics" directly to convince you that your assumptions are wrong. You're right that the epistemological concerns (confounding, generalization, etc.) predate statistics and don't require it, but I'd say statistics can at least show you the _consequences_ of your assumptions in a way that stands a better chance of moving the needle. For example, with different prior models, you and I could quantify how often and under what conditions different data-generating processes yield different conclusions between us. You could show that simulations from my prior model yield summary statistics that fall well outside a range I'm comfortable with. Maybe my marginal covariate model is not flexible enough to reliably inform predictions in an unseen circumstance of interest. The formal machinery makes our disagreement precise enough to be productive.

FWIW I've taken a lot from Michael Betancourt's Bayesian Workflow chapter, which lays out this kind of prospective calibration in detail: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#13_Inferential_Calibration

John Quiggin

Jan 20

Worth pointing out that Feyerabend was playing the same game as Fisher. His anti-method wasn't the result of deep reasoning but of the fact that "alternative" medical theories and treatments, to which he was committed, always bombed out in statistical testing.

Reply (1)

Ben Recht

Jan 20

Come now, that is a rather uncharitable reading of Feyerabend. His critiques in Against Method were leveled at far "harder" science than medicine.

Reply (1)

John Quiggin

Jan 20

He presented the attack in terms of scientific methodology in general, but medicine was the real target and (AFAICT) the main area where he has had continuing influence, for example https://www.sciencedirect.com/science/article/abs/pii/S1369848613000733

Reply (1)

John Quiggin

Jan 20

For example, while there's still debate among philosophers of science about his defence of astrology, neither astronomers nor astrologers pay any attention to it, AFAICT.

John Encaustum

Jan 21

If you haven’t come across it yet, check out Ted Porter’s work in the area! _Trust in Numbers_ is a classic.

Carl Allen

Jan 20Edited

The rules of statistical inference as outlined (discovered?) by Bernoulli are remarkable, if you read his work in some depth, he talks about the idea of "moral certainty" (which we would translate today as a confidence level with many nines)

Of course, the 95% confidence interval is most common today - and while it is deeply misunderstood, it is most prevalently cited by nonexperts, I believe, in the field of polling.

Probability itself is a VERY new field, and far less developed than other "math" despite being simpler on its face. I describe probability as the logic of statistics: "what does this data say?" the statistic, seems easy. But probability asks "what does this data mean?" - and the logic is often poorly applied: say and mean are often conflated.

In response to "lies, damned lies, and statistics" I say:

Statistics do not lie, but people who do not understand statistics are easily misled.

Erik

Feb 19

Yes, humans are mostly narrative-driven, and sometimes narrative-plus-data, but not actually full-on data-driven (it is not a thing).

Miri

Jan 21

I had a really bad experience of this maternity care. Statistics that are not even closer to convincing being used to pushed expectant mothers to make critical decisions in a manner that is far from informed. Being on the other side of "evidence based" practice made me realise it is often a way to justify worse care.

Reply (1)

Miri

Jan 21

Sorry about the typos, hopefully still clear. *as part of Mat care

mitch roddenberry

Jan 21

Maybe you don't necessarily need to go full Feyerabend, but if you've already been radicalized by years of reading arg min, why not go all the way?

Reply (1)

Miri

Jan 21

Haha, no, I am new the his musings. I think I was actually in favour of evidence-based practice before I saw it being used in practice! Now I am much more cautious

Kevin M

Jan 20

nearly everything in the world created by humans involves rhetoric because most things created by humans for other humans to use requires communication, justification, and argument.

Comment removed

Comment removed

Yes. It's funny how if you call it "rhetoric" it feels optional, but "logic" and it feels mandatory.