13 Comments

"Machine learning is what we do when we don’t understand. When we do understand, we just write the damned code."

Yep, exactly: Hume (inferring necessary connections from constant conjunctions) when we don't understand, Kant (synthetic a priori and all that jazz) when we do.

Expand full comment
author

Hah. That's perfect. But if Hume is thesis, Kant antithesis, who is the dialectical synthesis?

Expand full comment
Sep 1, 2023·edited Sep 1, 2023Liked by Ben Recht

Darwin, I am tempted to say. I think the evolutionary take by people like Konrad Lorenz (what's a priori for an individual is a posteriori for the species) is largely correct and is just as applicable to machine learning.

Expand full comment
author

I would never have predicted you would say that, Max. But perhaps that means you are the synthesis, predictable neither by man nor machine.

Expand full comment

I am still not sure ,whether linear rules can become conscious though!

Expand full comment
author

Neither are they! I feel deceived.

Expand full comment

Am I a linear rule? I hope not, I fear I am.

This is not helping, Ben!

Expand full comment

Always good to revisit this history. A small quibble: It seems like we do understand what problems have large margin in that the data, as vectors, must be linearly separable, yadda yadda. The hard part is finding the featurization of that data so that it becomes separable. I agree we don't understand what problems support "learnable" featurizations that admit large margin solutions.

P.S. Parity seems very hard for a network to learn since mod 2 is inherently a super high-frequency function in the original space. Some of the network theories would predict that this is difficult unless you somehow enforced some extra structure into the network.

Expand full comment
author

I don't disagree with any of this, but you'd agree that all of these arguments are post-hoc, yes?

We know a classification problem has large-margin if I can find a separating hyperplane far from the data. How do I know a priori? I don't. But I can always check once I see the data.

Similarly, "parity is high-frequency" so can't be learned, but if I knew my pattern required recognizing high-frequencies, I'd add high frequency detectors to my feature set.

Does that make sense? I find all of our explanations of learnability unsatisfying in this way.

Expand full comment

I'm in the camp that thinks nature has had a long time to experiment with various ways of growing brains so that our brains know what kind of symmetries etc to expect in the real world that can make a problem learnable

Expand full comment

Yeah for sure. It seems like the difficulty is finding a sufficient language that can describe real-world data. It must capture our intuition and be mathematically translatable to learnability without requiring analysis of the data beyond what we do intuitively... seems quite hard!

Expand full comment