18 Comments

Isn't basically all of ML based on the assumption that there exists some unknown distribution over basically everything?

Expand full comment

Though that's the way it's usually taught, you don't need to assume that data is random to do machine learning. For more, read:

1. https://www.argmin.net/p/learning-with-intentional-randomness

2. https://www.argmin.net/p/yoshimi-battles-the-perceptrons

3. https://www.argmin.net/p/regretfully-yours

Expand full comment

"all observations are generated by having god randomly generate an iid sample from a probability distribution governed by a few parameters. " This world view is so confusing. "Random variables" in statistical world view seem to be super zombie which make everything rv. Any constant/object + random variable is a random variable. Random variable infects everything ! This worldview is good for mathematical analysis or exploration in some context. Generalizing this idea is so weird.

Expand full comment

Interesting read. Would you mind going a little deeper on your last paragraph?

I’m interested in what the notion of MLE being on unstable ground implies about the philosophical implications of say the standard model in physics. Is that, too, non-rigorous from this perspective for the following reasons 1) the assumptions of the standard model itself represent an over simplification of the world and 2) it is experimentally verified using the methods of maximum likelihood inference (which as you say is unreliable).

I guess the question is how much of science becomes non-rigorous when these standards are held to different fields than statistics

Expand full comment

There's something very different behind what we do in physics and what Fisher is proposing here.

The standard model makes a bunch of risky predictions about the existence of quarks, bosons, and various other esoteric particles, and its predictions, though not precise, have been certified over and over again.

In Fisher 1922, he's simply proposing a method of data summarization that takes past, given data, and converts it into a few simple summaries. There is nothing predictive about maximum likelihood.

Now, when you have a parametric, probabilistic model that you believe is true, and you want to fit the parameters to it, there are a variety of schemes at your disposal. People use maximum likelihood to fit these parameters, but I'd bet there are plenty of other ways to get the parameters that would be just as good, I'd be curious to read more about what statistical methods are accepted by the particle physics community and when these methods became accepted.

Expand full comment

I’m admittedly not an expert in this area, but here’s an article I read on the methods used in confirmation/validation of the Higgs particle: https://e-publishing.cern.ch/index.php/CYRSP/article/download/755/658/3737

I’m still not entirely sure how to think about scientific models when they can only be verified using null hypothesis significance testing. Perhaps I’m incorrectly conflating NHSTs with fisherian likelihood, though.

Expand full comment

Oh, for sure. But scientific confirmation is far more complex than the NHST. And I don't think the NHST have all that much to do with why we believe there is a Higgs. It's a small piece of a much larger puzzle.

But what's funny is that probability in NHSTs has a ton of different interpretations too. In a Fisher-style experiment where the randomness comes from the experimenter creating it to illuminate causal structure. In Neyman-Pearson test, the probability comes through explicit data-generating models. In observational NHSTs, the randomness comes through a pure thought experiment. Very weird how all of these get lumped into the same boat of significance testing.

Expand full comment

I'm really enjoying your journey through these old papers, and the cogent summaries you add to it.

Expand full comment

Thank you!

Expand full comment

Hey Ben, I was recently discussing the use of personal belief probabilities with someone espousing Bayesian reasoning (i.e. “My prediction that X candidate will win Y race is 30%” with implied conditioned on components) and was delighted to learn about the Stanford Encyclopedia of Philosophy. Have you seen or engaged with this: https://plato.stanford.edu/entries/probability-interpret/ ? It covers a lot of your objections and some that I expect you have but haven’t posted yet.

Expand full comment

The Stanford Encyclopedia is so impressive. I wish other scholarly disciplines would team together to produce open, collaborative summaries of their fields.

But I'd cut your Bayesian friend a little slack. Dogmatic Bayesianism is annoying, but I don't think dogmatic frequentism makes any sense either. The amazing part is these systems agree more than they disagree, and that's too strange to be a coincidence.

The one thing the encyclopedia article doesn't do is dive into the social and technical reasons we are so attracted to probability.

Expand full comment

I look at it with an increasing grain of salt, for many of the reasons you have pointed out in some of your posts over the last few months about strength of evidence and testing, but also with what you said in this one: all models are wrong, but some are useful. The issue comes in when the model is so wrong that it stops being useful. I’m reminded of a story my information theory professor told in grad school about how feedback control was proven to have no benefit to channel capacity, but that the proof only held for certain i.i.d. conditions that don’t hold in the real world, so it prevented good cellphone technology for an additional decade. I don’t have the citation on that, and I may have slightly messed up the technicalities since I’m having a hard time googling it, but the gist is definitely true.

Expand full comment

The view I’m most comfortable with at the moment is this:

“the primary reason why so many scientific and engineering disciplines make use of the calculus of probability is not because it somehow encodes the “laws of chance” (if only because there is no universal agreement on what constitutes “chance”). It is the structure of the probability calculus that renders it so useful, apart from any interpretation of what the probabilities mean.”

https://realizable.substack.com/p/probabilities-coherence-correspondence

Expand full comment

At the end of the frontmatter of "Elements of Statistical Learning" (Hastie, Tibshirani, Friedman) there is a quote from Ian Hackman: "The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions". The more you read this substack series, the meaning of that quote becomes increasingly double-sided.

Expand full comment

Yup. It's also pretty funny that they describe statisticians as quiet...

Expand full comment

"We make some untestable assumptions about the world in order to tell a story about data." I feel this gives a good summary of typical social science. Whether that's deemed useful seems to vary a lot by opinion. That's a genuine issue, any research topic is important if a few smart people say it's important.

"The postulate of randomness thus resolves itself into the question, ‘Of what population is this a random sample?’" This quote is great too. You can always infer from your sample to The World by defining (or being purposefully vague about) your population as that which your sample is a random sample of. Hence, you do science.

Expand full comment

By the end of this road, we will hit the self-averaging argmin post.

(Fascinating as always. Love this new thread of posts on the he mis-foundations of probabilistic thinking.)

Expand full comment

Thank you! ICYMI: here's an orphaned early entry in the series on Frank Ramsey

https://www.argmin.net/p/the-birth-of-the-utility-monster

Expand full comment