Agree with all of this (as usual). Yet it is a little unsatisfying as they are not the same level of abstractions. Perhaps statistical summaries solve an important inverse problem if the model is correct… but what problems do generative models reliably solve?
Hopefully it is a safe space to ask this away from the AGI pilled. I have some hot takes, but want to hear yours!
"what problems do generative models reliably solve?"
great question.
Attempted partial answers:
1. I think simulation is a valid task. Though I understand it is tricky to articulate what it means for simulations to be reliable.
2. LLMs are generative models, and they are definitely useful for lots of things. But it's a bit messy to say *what* simulating language solves. Which is why they drive everyone insane.
The main points of the post are super clear! For discriminative models, Fisher's justification of maximum likelihood is not satisfying and if the end goal is a simulation, the technique can be justified better.
However, even here, there are a bunch of problems:
1. As you have shared in several other posts except this one ("These simulators often *need to be probabilistic* because of the nonuniqueness of acceptable outputs"), they don't need to be probabilistic. Chomsky succeeded in a generative linguistic model devoid of probability for at least baby versions of natural languages and found greater success in design and parsing of programming languages and compilers.
2. The "probability" of a "training set" is tricky to define. Let us take language and define mimicking training set to be mimicking natural sentences.
(2a) It won't be too hard to find a coherent sentence composed of rare words with a lower probability than an incoherent sentence composed of common words. How then do we translate this probability to something meaningful?
(2b) Often for any downstream application of the simulation, we seek some generalization outside the training set. We expect that the simulation generates other plausible sentences and not merely regurgitate. But what does that even mean without defining "plausibility" or the desiderata of the simulation model?
3. It is not clear that even for simulation, maximum likelihood is the best optimization target. There is a tension in probabilistic generative models that they have to cater to both looking very much like the training set (high likelihood) and diversity (somehow cover the hypothetical population) and so we often sample responses at non-zero temperature (not just as a workaround foe beam search) till we get something satisfactory.
4. The messiness of an output from any simulation is partly due to having to explain all the ways it may not be realistic enough! The simulation can defy physics, sometimes solve olympiad problems and sometimes fail elementary math, may defy causality, produce clearly implausible things in a way we can't fully explain / model and so caveat emptor for any high stakes applications. Hence, pushed on what do they *really* solve reliably, despite finding lots of utility daily from them.
And yet, no disagreement with the core premise of your post and all these may be tangents... It seems that when we relentlessly push for realism of these generations, we are able to squeeze more utility out of them. Thus the exhortation on making them convincing twins seems like a good idea.
Sure, [strong] generative models are useful if you want to simulate data.
But I think "summaries without convincing simulations [...] should be held in high suspicion" is too strong. At the very least, expressive/strong generative models should be taken with just as much suspicion if literally treating them as "a model for how the real data was generated"!
The ultimate question, which is a hard one, is what do we want out of "a model". Obviously this is not an easy question to answer.
The way I see it (people might disagree) is that back when contemporary deep learning was reborn, after the initial excitement, many people started noticing the shortcomings (adversarial examples, brittleness, etc) and a rather common thought or excuse was: "sure, but this is only because we train discriminative models. We need to be training Generative models and _then_ you'll see that we will get _real_ human-like understanding, of latent variables and causal factors and all".
Over the past ~5 years a *lot* of effort (research, compute, funding) has been put to test this hypothesis. And I would say (again, people might disagree) that it has been largely proven wrong. Turn out that with enough data, you can _generate_ in completely convincing, non-trivial, domains (images, text, ...) **without any real understanding** (I'm not going to debate what "real understanding" is here, though). In fact I think this is, perhaps, one of the more significant scientific discoveries that came out of contemporary Deep Learning. This is even before current LLMs -- take Machine Translation for example. For ages, people argued very strongly (and with what sounded like very good arguments) that the only way to get a real, functional, automatic translation was for systems to have a real understanding of the text. Turns out that this is wrong. Same for image generation, and even video generation (despite the ridiculous attempts to call large video models "physics simulators").
To clarify my position: I am skeptical of all “pure simulation” be it data-driven or model driven. Simulation is an incredibly tricky subject and it’s incredibly challenging to do at high fidelity. The part you quoted was just trying to say, be *even more skeptical* of simulations that just return histograms or parameter estimates. (I should write more about simulation. Adding it to my blog todo list).
With regards to the discourse on generative-vs-discriminative models, I totally agree with you. People who thought generative models would bring “understanding” were and are wrong. I’m not advocating that generative models tell us anything about the process other than they can recapitulate it. However, that they can recapitulate process is super impressive and warrants a lot more investigation!
It pains me to shoehorn this into your serious ML blog post, but I wanted to say that many games this past weekend offered a masterclass in how going for it on 4th is kinda dumb actually
I am definitely not opposed to going for it on 4th down or two-point conversions, but I definitely agree that commitment to this bit seems to be getting out of hand. Were you speaking about the Lions here or were there more egregious examples?
I think PPC in practice is far too close to "Statistical models as summaries.” Simulated summaries are still too flattened for my taste. So if a simulator can only produce histograms or parameter estimates, I don’t give much credence to its model.
I don't like it! But I don't really understand what anyone means by "unsupervised" vs "supervised." The distinction quickly devolves into pedantry, and I'm not sure why it's important.
Agree with all of this (as usual). Yet it is a little unsatisfying as they are not the same level of abstractions. Perhaps statistical summaries solve an important inverse problem if the model is correct… but what problems do generative models reliably solve?
Hopefully it is a safe space to ask this away from the AGI pilled. I have some hot takes, but want to hear yours!
"what problems do generative models reliably solve?"
great question.
Attempted partial answers:
1. I think simulation is a valid task. Though I understand it is tricky to articulate what it means for simulations to be reliable.
2. LLMs are generative models, and they are definitely useful for lots of things. But it's a bit messy to say *what* simulating language solves. Which is why they drive everyone insane.
The main points of the post are super clear! For discriminative models, Fisher's justification of maximum likelihood is not satisfying and if the end goal is a simulation, the technique can be justified better.
However, even here, there are a bunch of problems:
1. As you have shared in several other posts except this one ("These simulators often *need to be probabilistic* because of the nonuniqueness of acceptable outputs"), they don't need to be probabilistic. Chomsky succeeded in a generative linguistic model devoid of probability for at least baby versions of natural languages and found greater success in design and parsing of programming languages and compilers.
2. The "probability" of a "training set" is tricky to define. Let us take language and define mimicking training set to be mimicking natural sentences.
(2a) It won't be too hard to find a coherent sentence composed of rare words with a lower probability than an incoherent sentence composed of common words. How then do we translate this probability to something meaningful?
(2b) Often for any downstream application of the simulation, we seek some generalization outside the training set. We expect that the simulation generates other plausible sentences and not merely regurgitate. But what does that even mean without defining "plausibility" or the desiderata of the simulation model?
3. It is not clear that even for simulation, maximum likelihood is the best optimization target. There is a tension in probabilistic generative models that they have to cater to both looking very much like the training set (high likelihood) and diversity (somehow cover the hypothetical population) and so we often sample responses at non-zero temperature (not just as a workaround foe beam search) till we get something satisfactory.
4. The messiness of an output from any simulation is partly due to having to explain all the ways it may not be realistic enough! The simulation can defy physics, sometimes solve olympiad problems and sometimes fail elementary math, may defy causality, produce clearly implausible things in a way we can't fully explain / model and so caveat emptor for any high stakes applications. Hence, pushed on what do they *really* solve reliably, despite finding lots of utility daily from them.
And yet, no disagreement with the core premise of your post and all these may be tangents... It seems that when we relentlessly push for realism of these generations, we are able to squeeze more utility out of them. Thus the exhortation on making them convincing twins seems like a good idea.
Sure, [strong] generative models are useful if you want to simulate data.
But I think "summaries without convincing simulations [...] should be held in high suspicion" is too strong. At the very least, expressive/strong generative models should be taken with just as much suspicion if literally treating them as "a model for how the real data was generated"!
The ultimate question, which is a hard one, is what do we want out of "a model". Obviously this is not an easy question to answer.
The way I see it (people might disagree) is that back when contemporary deep learning was reborn, after the initial excitement, many people started noticing the shortcomings (adversarial examples, brittleness, etc) and a rather common thought or excuse was: "sure, but this is only because we train discriminative models. We need to be training Generative models and _then_ you'll see that we will get _real_ human-like understanding, of latent variables and causal factors and all".
Over the past ~5 years a *lot* of effort (research, compute, funding) has been put to test this hypothesis. And I would say (again, people might disagree) that it has been largely proven wrong. Turn out that with enough data, you can _generate_ in completely convincing, non-trivial, domains (images, text, ...) **without any real understanding** (I'm not going to debate what "real understanding" is here, though). In fact I think this is, perhaps, one of the more significant scientific discoveries that came out of contemporary Deep Learning. This is even before current LLMs -- take Machine Translation for example. For ages, people argued very strongly (and with what sounded like very good arguments) that the only way to get a real, functional, automatic translation was for systems to have a real understanding of the text. Turns out that this is wrong. Same for image generation, and even video generation (despite the ridiculous attempts to call large video models "physics simulators").
To clarify my position: I am skeptical of all “pure simulation” be it data-driven or model driven. Simulation is an incredibly tricky subject and it’s incredibly challenging to do at high fidelity. The part you quoted was just trying to say, be *even more skeptical* of simulations that just return histograms or parameter estimates. (I should write more about simulation. Adding it to my blog todo list).
With regards to the discourse on generative-vs-discriminative models, I totally agree with you. People who thought generative models would bring “understanding” were and are wrong. I’m not advocating that generative models tell us anything about the process other than they can recapitulate it. However, that they can recapitulate process is super impressive and warrants a lot more investigation!
It pains me to shoehorn this into your serious ML blog post, but I wanted to say that many games this past weekend offered a masterclass in how going for it on 4th is kinda dumb actually
LOL. I'll allow it!
I am definitely not opposed to going for it on 4th down or two-point conversions, but I definitely agree that commitment to this bit seems to be getting out of hand. Were you speaking about the Lions here or were there more egregious examples?
Love this!
I wonder what you think of posterior predictive checking in Bayesian modeling in this context.
I think PPC in practice is far too close to "Statistical models as summaries.” Simulated summaries are still too flattened for my taste. So if a simulator can only produce histograms or parameter estimates, I don’t give much credence to its model.
Wonder what would you say about usage of the term "unsupervised learning" for generative modeling
I don't like it! But I don't really understand what anyone means by "unsupervised" vs "supervised." The distinction quickly devolves into pedantry, and I'm not sure why it's important.