The war of symbolic aggression

Ben Recht

Sep 1, 2023

Internecine battles for the soul of AI

Read →

13 Comments

Maxim Raginsky

Sep 1, 2023

"Machine learning is what we do when we don’t understand. When we do understand, we just write the damned code."

Yep, exactly: Hume (inferring necessary connections from constant conjunctions) when we don't understand, Kant (synthetic a priori and all that jazz) when we do.

Expand full comment

Reply (1)

Ben Recht

Sep 1, 2023

Hah. That's perfect. But if Hume is thesis, Kant antithesis, who is the dialectical synthesis?

Expand full comment

Reply (1)

Maxim Raginsky

Sep 1, 2023Edited

Darwin, I am tempted to say. I think the evolutionary take by people like Konrad Lorenz (what's a priori for an individual is a posteriori for the species) is largely correct and is just as applicable to machine learning.

Expand full comment

Reply (1)

Ben Recht

Sep 1, 2023

I would never have predicted you would say that, Max. But perhaps that means you are the synthesis, predictable neither by man nor machine.

Expand full comment

Misha Belkin

Sep 1, 2023

I am still not sure ,whether linear rules can become conscious though!

Expand full comment

Reply (2)

Ben Recht

Sep 1, 2023

Neither are they! I feel deceived.

Expand full comment

Reply (1)

Misha Belkin

Sep 2, 2023

Am I a linear rule? I hope not, I fear I am.

This is not helping, Ben!

Expand full comment

Maxim Raginsky

Sep 1, 2023

No.

Expand full comment

Reply (1)

Misha Belkin

Sep 1, 2023

Why not?

Expand full comment

Kameron Decker Harris

Sep 1, 2023

Always good to revisit this history. A small quibble: It seems like we do understand what problems have large margin in that the data, as vectors, must be linearly separable, yadda yadda. The hard part is finding the featurization of that data so that it becomes separable. I agree we don't understand what problems support "learnable" featurizations that admit large margin solutions.

P.S. Parity seems very hard for a network to learn since mod 2 is inherently a super high-frequency function in the original space. Some of the network theories would predict that this is difficult unless you somehow enforced some extra structure into the network.

Expand full comment

Reply (1)

Ben Recht

Sep 1, 2023

I don't disagree with any of this, but you'd agree that all of these arguments are post-hoc, yes?

We know a classification problem has large-margin if I can find a separating hyperplane far from the data. How do I know a priori? I don't. But I can always check once I see the data.

Similarly, "parity is high-frequency" so can't be learned, but if I knew my pattern required recognizing high-frequencies, I'd add high frequency detectors to my feature set.

Does that make sense? I find all of our explanations of learnability unsatisfying in this way.

Expand full comment

Reply (2)

Kameron Decker Harris

Sep 1, 2023

I'm in the camp that thinks nature has had a long time to experiment with various ways of growing brains so that our brains know what kind of symmetries etc to expect in the real world that can make a problem learnable

Expand full comment

Kameron Decker Harris

Sep 1, 2023

Yeah for sure. It seems like the difficulty is finding a sufficient language that can describe real-world data. It must capture our intuition and be mathematically translatable to learnability without requiring analysis of the data beyond what we do intuitively... seems quite hard!

Expand full comment

arg min

The war of symbolic aggression