16 Comments
User's avatar
Matt's avatar

Love this framing!

Expand full comment
Visar Berisha's avatar

One thing I've done in a speech ML class is cross-list it between Speech & Hearing and Engineering. It brings together students with clinical/social speech expertise and those with technical speech expertise. It's group and project based. I've tried to design each project so that both play a role - e.g. collect the right kind of speech data, design the right kind of feature extractors, interpret the output of the model in context, etc.

Plenty of new challenges emerge, it's really challenging to teach, but I’ve found that teaching ML without anchoring it in a domain-relevant problem isn’t all that useful in practice.

Expand full comment
Ryan S's avatar

Regarding homework: could be time to take a cue from sociology and have CS grad students... write papers *gasp*

Expand full comment
Ben Recht's avatar

One thing I did this semester was add a question about their course project to every problem set. I'm going to lean more heavily on this next time I teach the class.

Expand full comment
Alexandre Passos's avatar

I think there is these days interesting phenomenological math not ontological math you can justify well in ML. Things like "assuming you want your neural network's activations or gradients to be invariant to the number of layers, this is how you initialize / normalize / etc", or "these are useful power law models of how neural networks learn", or "neural networks are obviously not quadratics but if you squint and pretend that they are you can predict a lot of the curves seen during training". Which I think vibes very well with your argument that generalization is an axiom and with the general vibe that ML made not much progress while it tried to treat things as math but only unblocked when it started trying to treat things as physics.

Expand full comment
Ben Recht's avatar

Hmm. I'm not sure I buy that the phenomenology is math or physics. Machine learning is an engineering discipline. The phenomenology is a convenient way to plot best practice, but I don't think it provides any fundamental laws. For example, no one has come up with a reasonable explanation of these silly scatter plots yet, but they are very helpful to inform practice: https://arxiv.org/abs/1902.10811

Expand full comment
The AI Architect's avatar

Spot-on framing! The "generalization as axiom" lens nails why ML pedagogy feels so awkward. What really stands out though is how this organic pull-request evolution accidentally mirrors biological adaptation: vestigal structures, local fitness maxima, and selection pressure from leaderboards instead of enviroments. Maybe the technical debt isn't a bug but the actual substrate.

Expand full comment
Maxim Raginsky's avatar

There are certainly parallels to biological evolution, I wrote about this a while ago: https://realizable.substack.com/p/verum-et-factum-convertuntur-again

Expand full comment
Joe Jordan's avatar

My claim would be that the computer program that classifies images is fairly straightforward to write, and exists already in nucce on every device in the world: the jpeg compression algorithm. If you do PCA on a bunch of images you will get a set of filters, the eigenvectors, and their weights, the eigenvalues. The jpeg algorithm has different weights than a neural net for classification but they both work by turning an image into a linear combination of filters. This is also basically what attention heads are doing, but in a higher dimensional space.

Expand full comment
Anna Gilbert's avatar

I certainly agree that benchmarking is the main approach to “proving” a method works and that checking the leaderboard can tell you which method or architecture makes the most sense for a range of benchmark datasets and tasks but the “Kagglification” of research has some real pitfalls. Are we sure people have established the baselines thoroughly and completely? Can we be sure of the leaderboard results when we can’t reproduce the results ourselves (it’s really hard to re-run someone else’s experiments)? How do we agree, as a community, on the benchmark datasets and tasks? How do we know that said benchmark datasets are representative of the type of task we really want to run on?

In other words, it’s a good model and it could do with some best practices from scientific experiments. I’ve also bandied about the idea of a “model organism” for ML as there are for various biological processes.

Expand full comment
Ben Recht's avatar

Yes, absolutely. I'm not arguing to Kagglify all of research, but *machine learning* is inseparable from benchmarking. And though I understand the unease in your questions, I think the evidence is quite compelling that most of the interesting results in AI can be traced back to this culture of benchmarking.

That said, I am by no means a machine learning imperialist. It's a useful engineering technology with undeniably impressive applications. But I'm worried its success has convinced too many people that it is a panacea for all scientific advancement. I don't subscribe to that view!

Expand full comment
Anna Gilbert's avatar

Totally agree with you! It’s not a panacea for scientific advancement. And, the really big innovations have come about because of benchmarking. Maybe my point can be summarized as: do the benchmarking really really carefully and well to show a truly big advance and not every medium-sized idea is a true advance :)

Expand full comment
Kevin M's avatar

What about questions that is more reflective? Maybe you derive a proof to convince yourself it works this way and then having them reflect about it in more humanities perspective?

Expand full comment
Ben Recht's avatar

Mathematical theory is a convenient language for formalization. The problem arises when your theory is too fictional to guide practical considerations.

Expand full comment
Adam Ginensky's avatar

You write- "Despite statistical arguments declaring it fundamentally flawed, the culture of competitive testing on benchmarks has driven and still drives the engine of what the field defines as progress. " Can you justify this ? For example I was under the impression that various CV methods were asymptotically AIC and BIC.

In general, I think that ML is driven by 'approximate methods' in the sense that we observe data that we know is noisy, therefore, we can't derive exact answers. Things like PAC are the best we can do. I think this makes it different from most other applications of mathematics to science.

Expand full comment
Ben Recht's avatar

PAC Learning makes poor predictions and gives bad advice. I don't think it's the best we can do. https://www.argmin.net/p/thou-shalt-not-overfit

Expand full comment