Discussion about this post

User's avatar
Badri's avatar

Agree with all of this (as usual). Yet it is a little unsatisfying as they are not the same level of abstractions. Perhaps statistical summaries solve an important inverse problem if the model is correct… but what problems do generative models reliably solve?

Hopefully it is a safe space to ask this away from the AGI pilled. I have some hot takes, but want to hear yours!

Lior Fox's avatar

Sure, [strong] generative models are useful if you want to simulate data.

But I think "summaries without convincing simulations [...] should be held in high suspicion" is too strong. At the very least, expressive/strong generative models should be taken with just as much suspicion if literally treating them as "a model for how the real data was generated"!

The ultimate question, which is a hard one, is what do we want out of "a model". Obviously this is not an easy question to answer.

The way I see it (people might disagree) is that back when contemporary deep learning was reborn, after the initial excitement, many people started noticing the shortcomings (adversarial examples, brittleness, etc) and a rather common thought or excuse was: "sure, but this is only because we train discriminative models. We need to be training Generative models and _then_ you'll see that we will get _real_ human-like understanding, of latent variables and causal factors and all".

Over the past ~5 years a *lot* of effort (research, compute, funding) has been put to test this hypothesis. And I would say (again, people might disagree) that it has been largely proven wrong. Turn out that with enough data, you can _generate_ in completely convincing, non-trivial, domains (images, text, ...) **without any real understanding** (I'm not going to debate what "real understanding" is here, though). In fact I think this is, perhaps, one of the more significant scientific discoveries that came out of contemporary Deep Learning. This is even before current LLMs -- take Machine Translation for example. For ages, people argued very strongly (and with what sounded like very good arguments) that the only way to get a real, functional, automatic translation was for systems to have a real understanding of the text. Turns out that this is wrong. Same for image generation, and even video generation (despite the ridiculous attempts to call large video models "physics simulators").

10 more comments...

No posts

Ready for more?