Patterns, Predictions, and Actions (2025)

I’ll be live blogging my graduate course on machine learning this semester (Fall 2025). The course is based on the text Patterns, Predictions, and Actions by Moritz Hardt and myself.

This course will explore how patterns in data support predictions and consequential actions. Starting with the foundations of prediction, we look at the foundational optimization theory used to automate decision-making. We then turn to supervised learning, covering representation, optimization, and generalization as its key constituents. We will discuss datasets as benchmarks, examining their histories and scientific bases. We will then cover the related principles of statistical evaluation, drawing a through line from confidence intervals to AB testing to bandits to reinforcement learning. Throughout the course, we will draw upon connections to historical context, contemporary practice, and societal impact.

Lecture 1: Introduction
1. Slides (I'm not sure if these are helpful without my narration.)
Lecture 2: Rudiments of Prediction
1. Learning from clairvoyance
2. Lecture Notes
Lecture 3: Prediction from Samples (without features)
1. Your noise is my signal
2. The Actuary's Final Word
3. Lecture Notes
4. Recht, Benjamin (2025) “The Actuary’s Final Word on Algorithmic Decision-Making.” arXiv:2509.04546. [slides]
Lecture 4: Decision Theory
1. Justify your answer
2. Reading: Chapter 2 of PPA.
Lecture 5: Errors, Operating Characteristics, and Tradeoffs.
1. Stuck in the middle
2. Reading: Chapter 2 of PPA.
Lecture 6: Fairness and trade-offs (by Jessica Dai)
1. Reading: PPA Chapter 2.
2. Hardt et al. Equality of Opportunity in Supervised Learning.
3. Kleinberg et al. Inherent Trade-Offs in the Fair Determination of Risk Scores.
Lecture 7: The Perceptron
1. Common Descent
2. Reading: PPA Chapter 3.
Lecture 8: Numerically representing data
1. Boxes of numbers
2. Reading: PPA Chapter 4.
Lecture 9: Nonlinearity and approximation
1. Universal Cascades
2. Reading: PPA Chapter 4.
Lecture. 10: Stochastic Gradient Descent
1. Highly optimized optimizers
2. Reading: PPA Chapter 5.
Lecture 11: Analysis of the Stochastic Gradient Method
1. Minimal Theory
2. Reading: PPA Chapter 5.
Lecture 12: Theories of generalization
1. Reshelving Generalization
2. Reading: PPA Chapter 6. I’m not sure I actually recommend reading this.
Lecture 13: Datasets as benchmarks
1. Benchmarking our benchmarks
2. Reading: PPA Chapter 8.
3. Slides.
Lecture 14: Robustness and fragilities of dataset benchmarks
1. Desirable Sins (see the bottom of the post for too much extra reading)
2. Slides.
Lecture 15: Uncertainty Quantification
1. You’re Probably Right
Lecture 16: Prediction Intervals
1. Maybe You’re Wrong
2. Blogs on Prediction Intervals.
3. Ryan Tibshirani’s Notes on Conformal Prediction.
4. The infamous Vovk paper on conditional conformal prediction.
Lecture 17: Randomized experiments
1. Instrumentalized Actuarial Predictions
2. PPA Chapter 9
3. Some sketchy notes on RCTs.
Lecture 18: The impact of actions
1. Staging Interventions
2. PPA Chapter 9
3. Relevant old blog: Fractions or Laws of Nature?
4. Additional Reading: A bureaucratic theory of statistics.
5. Some slides on the bureaucratic nature of RCTs.
Lecture 19: Adaptive experimentation
1. (i.e., multiarmed bandits as in PPA Chapter 12)
2. How to pick a sample size.
3. Notes on adaptive experiments.
4. Auer, P., Ortner, R. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Period Math Hung 61, 55–65 (2010). https://doi.org/10.1007/s10998-010-3055-6
Lecture 20: Algorithms for online learning
1. Reading: PPA Chapter 12
2. Actions from predictions
Lecture 21: Generative Models
1. Digitally Twinning
2. Note to self: I need to write up my class notes on this. I went slightly more in depth than in the blog post

arg min

Patterns, Predictions, and Actions (2025)