To your point about tasks where it's easy to write a program to solve it vs having to rely on DL, we can see the same thing with the creation of complex patterns. If you didn't know, say, how to write reaction diffusion equations and solve them, some relatively simple 3 parameter images look endlessly complex. But you can't describe them easily with typical equations --- you need solved PDE's.
The lurking definition of parameter therein has troubled me since grad school.
Question would be how different a case though? If we lacked knowledge of certain signal processing or language primitives wouldn't fitting other classes of data seem much more difficult?
Counterargument would be that the "simple" pde models can't be easily fit even when you know their form in advance?
I'm not sure I understand what you mean, but when I think of things like chaos and turbulence, these are phenomena that are hard to predict even though we have reasonable models. So it's very different than "is this image a dog or a giraffe" which is a very simple problem but we can't write down a simple math program to solve it.
I think that's fair, but I don't think you need chaos here. Take a Turing Pattern for instance --- simple to generate from 2 paired PDEs with a handful of parameters --- but that doesn't mean we can easily extract the parameters from a picture of the pattern despite knowing the underlying model.
Doesn't "difficult to predict" in general mean that extracting the correct parameter values is very hard and there's exponential sensitivity to them. So it may not be as different as you implied?
Random question Ben. I was reading your book https://mlstory.org/index.html and I had an (irrelevant) question on chapter 6 about generalization. It's about "These powerful concentration inequalities let us precisely quantify how close the sample average will be to the population average. For instance, we know a person’s height is a positive number and that there are no people who are taller than nine feet. With these two facts, Hoeffding’s inequality tells us that if we sample the heights of thirty thousand individuals, our sample average will be within an inch of the true average height with probability at least 83%. This assertion is true no matter how large the population of individuals. The required sample size is dictated only by the variability of height, not by the number of total individuals." . Shouldn't the probability be at least 99.42% instead of 83%? I got to 99.42% by doing 1-exp((-2*(30000)*(1/12)^2)/(9^2)).
Would you consider a naturalist's approach qualitative? For example this paper (https://www.nature.com/articles/s41586-019-1138-y) cites Ethology and Behavioral Ecology as a way to study machine behavior. And there is a field (not so popular in neuroscience these days) called neuro-ethology.
Are these kinds of methods appropriate---in our opinion?
When I say qualitative, I refer to the study people and their practice. I am personally not interested in using social science to study computers. I think we can learn a lot by looking at what machine learning researchers and engineers themselves do (and writing papers anthropomorphising machines is definitely something they love to do).
MacKenzie has a lot of great stuff that I keep meaning to read, perhaps particularly relevantly a book using a sociological approach to study the interaction between computing and mathematical proof: https://mitpress.mit.edu/9780262632959/mechanizing-proof/
To your point about tasks where it's easy to write a program to solve it vs having to rely on DL, we can see the same thing with the creation of complex patterns. If you didn't know, say, how to write reaction diffusion equations and solve them, some relatively simple 3 parameter images look endlessly complex. But you can't describe them easily with typical equations --- you need solved PDE's.
The lurking definition of parameter therein has troubled me since grad school.
True, chaos is an interesting third case: We can define fractal images simply, but their structure is undecidably complex.
Question would be how different a case though? If we lacked knowledge of certain signal processing or language primitives wouldn't fitting other classes of data seem much more difficult?
Counterargument would be that the "simple" pde models can't be easily fit even when you know their form in advance?
I'm not sure I understand what you mean, but when I think of things like chaos and turbulence, these are phenomena that are hard to predict even though we have reasonable models. So it's very different than "is this image a dog or a giraffe" which is a very simple problem but we can't write down a simple math program to solve it.
I think that's fair, but I don't think you need chaos here. Take a Turing Pattern for instance --- simple to generate from 2 paired PDEs with a handful of parameters --- but that doesn't mean we can easily extract the parameters from a picture of the pattern despite knowing the underlying model.
Doesn't "difficult to predict" in general mean that extracting the correct parameter values is very hard and there's exponential sensitivity to them. So it may not be as different as you implied?
Random question Ben. I was reading your book https://mlstory.org/index.html and I had an (irrelevant) question on chapter 6 about generalization. It's about "These powerful concentration inequalities let us precisely quantify how close the sample average will be to the population average. For instance, we know a person’s height is a positive number and that there are no people who are taller than nine feet. With these two facts, Hoeffding’s inequality tells us that if we sample the heights of thirty thousand individuals, our sample average will be within an inch of the true average height with probability at least 83%. This assertion is true no matter how large the population of individuals. The required sample size is dictated only by the variability of height, not by the number of total individuals." . Shouldn't the probability be at least 99.42% instead of 83%? I got to 99.42% by doing 1-exp((-2*(30000)*(1/12)^2)/(9^2)).
A Unified Theory - Universal Language https://www.linkedin.com/pulse/unified-theory-consciousness-michael-molin/
Would you consider a naturalist's approach qualitative? For example this paper (https://www.nature.com/articles/s41586-019-1138-y) cites Ethology and Behavioral Ecology as a way to study machine behavior. And there is a field (not so popular in neuroscience these days) called neuro-ethology.
Are these kinds of methods appropriate---in our opinion?
When I say qualitative, I refer to the study people and their practice. I am personally not interested in using social science to study computers. I think we can learn a lot by looking at what machine learning researchers and engineers themselves do (and writing papers anthropomorphising machines is definitely something they love to do).
Thanks for the clarification.
Maybe Donald MacKenzie's social study of finance community (https://mitpress.mit.edu/9780262633673/an-engine-not-a-camera/) or Barry Barnes' sociology of knowledge models (https://www.jstor.org/stable/42852643) are more like it.
MacKenzie has a lot of great stuff that I keep meaning to read, perhaps particularly relevantly a book using a sociological approach to study the interaction between computing and mathematical proof: https://mitpress.mit.edu/9780262632959/mechanizing-proof/
Thanks! You may also like this recent paper: https://www.journals.uchicago.edu/doi/abs/10.1086/697318