Roughly speaking it means there is a piece of computer code that takes X as input and outputs a guess for what Y should be, and on some collection of Xs, it gets a large fraction of the Ys correct. But this definition stinks if you want to take it literally...
I think a distinction along the lines of numerical mathematics would be helpful for predictability as well. Issues from problems (conditioning) are distinguished from issues of numerical methods (stability). If you are confronted with an ill-conditioned problem, stable numerics are of limited help.
Similarly if your target is inherently unpredictable, competitions are bound to fail. Now if competitions are your only means of accessing the target, you have a problem. But not one that more competitions could solve.
* It's clear that there is some code out there that will achieve high prediction (apply an MNIST classifier after decrypting).
* It's also clear that a person can't do this task
My internal pattern recognition system then thinks it would be very hard for machine learning to make good predictions on this problem, and I wouldn't spend time on it. But that certainly doesn't mean someone else might be able to do it...
It is something I have been thinking about a bit since reading the work on the Fragile Families. I had two thoughts, both from streams that I am exploring.
One is that economic theory often builds models that predict a lack of predictability. A classic case is movements in stock prices, since any predictable movement allows for a money pump and should be arbitraged out. But more subtly, our models of how people consume also imply unpredictability in innovations in consumption, under the assumption that utility is concave in consumption, so we want to use all future known information, prices and income to smooth our consumption from one period to another. So, unpredictability in consumption innovations is actually a test of full insurance (& concave utility and all the other stuff we already assume). I wonder whether that approach of using the theory to provide a bound on predictability is important precisely because it can provide a test of the theory, appropriately formulated.
A second is that there are places where there are massive gains to improving empirical predictability around social questions, so that the kind of Fragile family exercise could perhaps be expanded to saying that our current ability to predict Y from X is low, and if you could find a new set of variables that could better predict Y, we would be much better off. A great example is our ability to predict how good teachers are based on everything we observe about them, which typically has R-squares of 5% of less. If we could find some attributes that we can screen on, we would all be much smarter, I think.
These are active areas of research for us, so would be great to get your thoughts!
I don't fully understand your example regarding consumption, but I now realize I need to learn more about what economic theory means by predictability. Any links to good introductions would be most welcome!
With regards to the second point, I should have linked back to this but I think two major issues with predictability in social systems:
- being able to operationalize the predicted variable into a well-constructed, temporally consistent numerical quantity
- being able to define a meaningful statistical evaluation of the predicted construct
I wrote a lot about this in a three-post sequence starting here:
The flipside of making an unpredictability claim based on "a certain function class doesn't contain XOR" is making a predictability claim based on an "a certain function class (e.g. output of arbitrarily deep/wide neural net) approximates every function arbitrarily closely." I have just come to feel that maybe the question of what's contained in a function class is not that relevant to anything we might reasonably mean by predictability.
Your original question as it's phrased "Can X predict Y" seemed to be asking about whether Y is statistically independent from X. But then you brought up the problem of expressibility - can a perceptron *express* the concept of XOR. If the question is statistical independence, then you may have some hope of proving it theoretically, or, you might show that the signal is convolved with the noise in what amounts to a cryptographically strong way.
BUT if you're wrangling with it from an expressibility perspective then you HAVE to make assumptions on the information content of the features, and how it is expressed. (e.g. Linear Measurements?) If X can be any simple transform of X then of course any simple transform can be expressed!
These two questions - "Is the information in there at all?" and "Is the functional form of the transformation found within the model family?" are not composable. You can't attack one without holding the other in place.
Regarding your footnote :) - ask and ye shall receive: https://www.undonecs.org/2026/
This is Suresh Venkat btw :)
Hello, sir! I didn't know you were on here. Does this substack double as your personal blog?
No I don't blog at substack, but our CNTR does, and so substack reverted to that account when asking me for my profile info.
What is the definition of predictability?
An exercise left to the reader!
Roughly speaking it means there is a piece of computer code that takes X as input and outputs a guess for what Y should be, and on some collection of Xs, it gets a large fraction of the Ys correct. But this definition stinks if you want to take it literally...
I think a distinction along the lines of numerical mathematics would be helpful for predictability as well. Issues from problems (conditioning) are distinguished from issues of numerical methods (stability). If you are confronted with an ill-conditioned problem, stable numerics are of limited help.
Similarly if your target is inherently unpredictable, competitions are bound to fail. Now if competitions are your only means of accessing the target, you have a problem. But not one that more competitions could solve.
What do you think about a case where X is encrypted images with same password (e.g. MNIST) and Y is original labels?
Interesting right? This violates 2 of my rules of thumb (https://www.argmin.net/p/the-war-of-symbolic-aggression)
* It's clear that there is some code out there that will achieve high prediction (apply an MNIST classifier after decrypting).
* It's also clear that a person can't do this task
My internal pattern recognition system then thinks it would be very hard for machine learning to make good predictions on this problem, and I wouldn't spend time on it. But that certainly doesn't mean someone else might be able to do it...
Thanks for the great posts!
Recently we did some simple calculations:
https://arxiv.org/abs/2507.09678
What a great post!
It is something I have been thinking about a bit since reading the work on the Fragile Families. I had two thoughts, both from streams that I am exploring.
One is that economic theory often builds models that predict a lack of predictability. A classic case is movements in stock prices, since any predictable movement allows for a money pump and should be arbitraged out. But more subtly, our models of how people consume also imply unpredictability in innovations in consumption, under the assumption that utility is concave in consumption, so we want to use all future known information, prices and income to smooth our consumption from one period to another. So, unpredictability in consumption innovations is actually a test of full insurance (& concave utility and all the other stuff we already assume). I wonder whether that approach of using the theory to provide a bound on predictability is important precisely because it can provide a test of the theory, appropriately formulated.
A second is that there are places where there are massive gains to improving empirical predictability around social questions, so that the kind of Fragile family exercise could perhaps be expanded to saying that our current ability to predict Y from X is low, and if you could find a new set of variables that could better predict Y, we would be much better off. A great example is our ability to predict how good teachers are based on everything we observe about them, which typically has R-squares of 5% of less. If we could find some attributes that we can screen on, we would all be much smarter, I think.
These are active areas of research for us, so would be great to get your thoughts!
I don't fully understand your example regarding consumption, but I now realize I need to learn more about what economic theory means by predictability. Any links to good introductions would be most welcome!
With regards to the second point, I should have linked back to this but I think two major issues with predictability in social systems:
- being able to operationalize the predicted variable into a well-constructed, temporally consistent numerical quantity
- being able to define a meaningful statistical evaluation of the predicted construct
I wrote a lot about this in a three-post sequence starting here:
https://www.argmin.net/p/clinical-versus-statistical-prediction
and am writing a longer paper about the same topic right now, which I hope to share in August.
The flipside of making an unpredictability claim based on "a certain function class doesn't contain XOR" is making a predictability claim based on an "a certain function class (e.g. output of arbitrarily deep/wide neural net) approximates every function arbitrarily closely." I have just come to feel that maybe the question of what's contained in a function class is not that relevant to anything we might reasonably mean by predictability.
Your original question as it's phrased "Can X predict Y" seemed to be asking about whether Y is statistically independent from X. But then you brought up the problem of expressibility - can a perceptron *express* the concept of XOR. If the question is statistical independence, then you may have some hope of proving it theoretically, or, you might show that the signal is convolved with the noise in what amounts to a cryptographically strong way.
BUT if you're wrangling with it from an expressibility perspective then you HAVE to make assumptions on the information content of the features, and how it is expressed. (e.g. Linear Measurements?) If X can be any simple transform of X then of course any simple transform can be expressed!
These two questions - "Is the information in there at all?" and "Is the functional form of the transformation found within the model family?" are not composable. You can't attack one without holding the other in place.