Last weekend, researchers from OpenAI took the hyperbolic overpromotion of their chatbots a bit too far, only to be embarrassed by an ensuing pile-on. They claimed their software had “found” solutions to “open” math problems, only for it later to be revealed that “found” did not mean solved, and “open” did not mean unsolved.
The party started when researcher Mark Sellke announced that he and a collaborator had used ChatGPT to find the solutions to ten open “Erdős” problems. These problems were named after Paul Erdős, a legendary Hungarian mathematician who both closed and opened problems prolifically (likely because he was famously always high on amphetamines). Erdős’ problems were usually closer to tricky math puzzles than long-standing conjectures like the twin prime conjecture or the Riemann Hypothesis. Nonetheless, they gave mathematicians plenty to think about, and working on them pushed the boundaries of number theory, probability, and combinatorics. Erdős often sweetened the pot by offering bounties of varying sizes for those who could find “book proofs” of his conjectures.
Erdős posed so many problems that it was impossible to keep track of them. In 2024, almost thirty years after Erdős’ passing, mathematician Thomas Bloom created a website to catalog Erdős problems and their solutions. Since the original problems had been so numerous and poorly cataloged, putting together such a webpage was an arduous passion project. Sellke used ChatGPT to find papers containing ten solutions unknown to Bloom. All the papers were published, but this literature search helped Bloom update his website. To be clear here: the working definition of “open problem” was “unknown to the maintainer of erdosproblems.com.”
It’s still very neat. ChatGPT seems quite proficient at semantic search of mathematics and can now find somewhat obscure results in an accelerating expanding universe of technical papers. Of course, it’s worth noting that these results aren’t always that obscure. One Sellke’s found solutions appeared in a paper entitled “A proof of two Erdos conjectures on restricted addition and further results,” published in 2003 in the respectable and well-known Crelle’s Journal.
Now, of course, we can’t just have a neat and fun story. Since ChatGPT was involved, someone would inevitably have to spin this into an argument of how AGI has arrived. Enter OpenAI employee Sebastien Bubeck, who has been breathlessly heralding the Sparks of AGI for three years now. Bubeck took to Twitter to proclaim: “Science Acceleration has officially begun: two researchers found the solution to 10 Erdos problems over the weekend with help from gpt-5.” Kevin Weil, the Vice President of Open Science at OpenAI (god, that is straight out of Orwell), tweeted “GPT-5 found solutions to 10 (!) previously unsolved Erdős problems and made progress on 11 others. These have all been open for decades.”
A shitstorm ensued. People understandably thought these OpenAI employees were saying that ChatGPT had solved challenging mathematical conjectures. If all you saw were these tweets, you could be forgiven for thinking that their software had gone from pissing off the organizers of the Math Olympiad to solving deep, important math problems in a matter of months.
Fortunately, despite all evidence to the contrary, you can still get piled on for overhyping AI on Twitter, and all of the competing labs pulled out the knives. Turing Award Winner and longtime Facebook employee Yann Lecun proclaimed, “Hoisted by their own GPTards.” Even Nobel Laureate and Alphabet executive Demis Hassabis chimed in to tell Bubeck, “This is embarrassing.”
To be fair, it’s Twitter. Embarrassment is part of the entry fee.
There’s an interesting psychology at play here, one which the protagonist Sebastien Bubeck has been demonstrating for several years. I’m not singling Seb out to further pile on. Instead, I’d like to use him as a cautionary tale of a too common behavior with generative AI: motivated forgetting.
Bubeck has consistently been telling us that ChatGPT is doing things that are not in the training data for years now. A key example occurred in a “debate” at the Simons Institute last October, where he breathlessly announced that ChatGPT was accelerating his mathematical work. Let me post his story in its entirety so it can speak for itself. I’ve edited the following for clarity, but you can find his exact words at 28:40 in this video.
“There is going to be a phase where before the AI solves a problem on its own, it’s going to collaborate with all of you guys. This is for sure. I don’t think it’s up for debate.
“I just want to share a personal experience that I had myself just in the last few weeks with o1 or o1 plus epsilon on a research problem. I know it is not out there because it’s a paper I have written. It’s a draft in my Dropbox, and I just didn’t publish it yet. So I know that it’s not in the training data. And you will see it’s a very fun question. It’s just about how long the gradient flow of a convex function can be.
“I asked this question to this o1 model. It thought for a long time, and then it made a connection with what’s called self-contracted curves, which are not necessarily directly linked to this. And it explained to me why it thinks that this would be a good idea to connect those two things. It gave me the bounds that exist in the literature, et cetera.
“When I was working on that problem, it took me three days to make that connection. So at the very least, it would have accelerated me by three days. And now you take this very basic current model. Take this next year. I have other anecdotal evidence of people—top people who are maybe sitting in this room, who were stuck on some lemma, asked o1, and o1 made the connection and helped them solve this problem. So this is not in the future. This is not hypothetical. It is happening right now. It’s the same story as with the medical diagnosis. It’s the same story in every field right now. AI is, at the very least, going to be almost an equal with us.”
Now, here’s the thing. Just like Thomas Bloom didn’t know about the solution of problem 339, even though there was a paper that said in its title that it was solving problem 339, there were already many papers that examined the length of gradient flow trajectories using self-contracted curves. For example, the 2021 JMLR paper “Path Length Bounds for Gradient Descent and Flow” by Chirag Gupta and collaborators builds on an extensive body of work in the space. I mention Gupta’s paper in particular because it has a solid related work section surveying the history of using this theory for this application, tracing it back to at least 1991. I also bring up this particular paper because of the following line from the acknowledgments:
“We thank Sebastien Bubeck for pointing us to some references on self-contracted curves.”
Hmm.
LLMs compress and rearticulate an impossible amount of information. No one knows what’s in the training data, even the employees of OpenAI. And so it can surprise us by revealing coherent descriptions that look like a profound discovery, when it’s merely a reference. LLMs let us search all the libraries in the world, but don’t return citations unless we repeatedly prompt for them.
Chatbots’ core capability is laundering the esoteric into insight. This could be through mimicking human dialogue as a companion agent. This could be through pulling a math proof from one of the gazillion math papers out there that you didn’t read, without giving proper attribution. These are the same thing. LLMs are Lore Laundering Machines.
But if we fetishize LLMs as machine gods, then motivated reasoning demands we see them as machine gods, even if this means completely discounting our own intelligence. If you want to believe that AGI is here, it helps to become willfully forgetful of what you already know.


This new paper is also relevant https://arxiv.org/abs/2510.19804 . I imagine the authors are having a little fun with the word "resolve" in the abstract, meaning either resolve or re-solve.
While I was studying martingales from Ross’ Stochastic processes textbook about 2 months ago, I could not see where the conditional independence or something like that is used in a proof of a lemma. In such dire situations, I tend to ask a knowledgeable friend and get a thoughtful response if all goes right. Unfortunately, I could not find a knowledgeable friend in this topic and taken the screenshot and posted it to chatgpt (free version). It has very quickly and kindly explained what I am missing out and the explanation was all correct. So, I cannot say that it is pushing the barriers of human knowledge; but, that conversation agent helped me to push my personal barriers and learn something new. This is what research advisors and people at similar roles do to help grad. students. I believe all education fronts will be among the firsts to experience this positive wave. Personalized training, access to a chatty subject expert has changed chess a lot, we have very many grandmasters around the world at the ages of 10-15 today. Even Turkey (my country) has two grandmasters who are in secondary and high school and they compete at the levels of Kasparov and are the top two players of the country. Hopefully, math education will also get positively affected with a similar personalized AI training with a subject expert and we can detect and educate the new Ramajuans whereever they are. Chatgpt will make Mathematicians Great Again, as if it is 18th century!