John Ioannidis’ latest bugbear is “extreme scientific productivity.” His team’s latest data crawl, written up in Nature last week, is worth a read. The study focused on Scopus-indexed journals, and I couldn’t help but think that this means they are missing the wild world of peer-reviewed computer science conferences.
Someone should get John Ioannidis to replicate his study with AI papers! It wouldn’t be hard, as most of these proceedings are online in an easily scrapable format. Maybe I should do it over winter break. Extreme productivity has certainly gone haywire in AI, to the detriment of the scholars of the field. I want to understand it more deeply because we need to pull back from the brink.
For context, allow me to recall my experience as a young scholar. I’m sure it’s not too different from others my age yelling at the children to get off their lawn. All the way through my first faculty gig at Wisconsin, I never sent more than one paper to N(eur)IPS per year. It would never have even occurred to me to do so. At the time, it was gauche to have multiple papers at single conferences, and people would often grumble about the profs at big schools with five papers at N(eur)IPS. If someone had five papers at FOCS, the big CS Theory conference, it was scandalous.
When I got to Berkeley in 2013, my experience definitely changed. There, I found myself advising multiple students leading work for the same conference deadline. This was a product of (a) an unhealthy culture in CS at Berkeley and (b) getting tenure shifted my duties further into the role of advising. I can discuss my issues with (a) and (b) in future blogs, but I have other concerns for today.
My concern is about the graduate students sending six papers to single conference deadlines. The pool of AI faculty candidates has swelled to over a quarter of our applications in electrical engineering and computer sciences. We consistently get extremely productive graduate students. Graduate student applicants who have their names on 50 papers. 50! Graduate student applicants with over 10,000 citations Applicants who have websites gloating about the number of papers they get accepted into ICML or NeurIPS or CVPR. This is crazy.
Not only are there more papers, but the papers are longer. In the olden days, the conferences had page limits. Now, you are allowed unbounded appendices and are expected to link to a github repo too. We have junior candidates who consistently write theoretical papers that are 50-100 pages long. It’s not like these papers are proving the Riemann hypothesis. More often than not, the first three pages invent a problem no one cares about, and the remaining 97 pages are closing all potential open questions about it.
The most striking part is this paper explosion is recent. It is alien to my time as an Assistant Professor. That wasn’t that long ago. How did we all become so “productive?” There are several answers, but, in my journey to understand decision making, today I dip my toe into academic decision systems. (Aha! arg min blog stays on topic!)
I worry that overproductivity is an unfortunate artifact of “Frictionless Reproducibility.” Artificial intelligence advances by inventing games and gloating to goad others to play. I’ve talked about how benchmark competitions are the prime mover of AI progress. But what if AI people have decided (perhaps subconsciously) that publishing itself is a game? Well, then you can just run reinforcement learning on your research career.
Running RL on academia has become easier as tooling has improved. Writing LaTeX is so streamlined that every random conversation can immediately become an Overleaf project. Code can be git-pulled, modified, and effortlessly turned into a new repo. Everyone has a Google Scholar page highlighting citation counts, h-indices, and i10 numbers. These scores can be easily processed by the hiring managers at AI research labs. The conferences are all run by byzantine HR systems that accelerate form-filling and button-checking. And the conference program committees have all decided to have a fixed acceptance rate that is low enough to give an aura of “prestige,” even though the acceptance process is indistinguishable from coin flipping. They claim a conference has clout if it has a fixed 25% acceptance rate. If the community sends 100 papers, 25 are published. If it sends 10000 papers, 2500 are published. It doesn’t matter if they are good or not.
What’s the way out? We all know the answer! Let’s go back to that Nature article
Ioannidis thinks that, to stem the rising tide of extremely productive authors, research institutions and funding agencies should focus on the quality of a researcher’s work instead of on the volume of papers they publish. This would prevent scientists from cutting corners. “The number of papers should not really count as positive or negative,” he says.
I mean, in many ways this is true. And it’s why this monstrous overproduction works against everyone who participates. Berkeley still asks people to upload three important papers when they apply. I read them. Though the papers are often very long, I worry that most don’t say much beyond their abstract. That’s a bad sign for all of us.
But this leads me to a piece of advice I really hope we all consider. To all the young scholars out there in AI: You should finish your PhD with three papers that you are decidedly passionate about. Three papers that you can tell a strong story about. And if your friend asks you to work on some other project distracting from those three, it’s ok to say, “your project is amazing, but I don’t have time to give it my all for the deadline.” These are simple things. We’d all be better off stepping back, breathing, and embracing them.
As an engineer, I'm constantly disappointed by the amount of work wasted due to misleading claims. We have a feedback loop -- 1) too many papers to review carefully, 2) forcing reviewers to focus on a few hackable signals 3) leading authors to optimize for those signals instead of usefulness.
IE, you can get a paper published by claiming a new optimizer that beats Adam uniformly. Reviewers don't have time to try it themselves, so they let it through on face value. If they tested it, they probably would have objected to claims, independent reproduction effort has found 0 success (https://arxiv.org/pdf/2007.01547 ) .
A recent personal example, there's a specific work that several groups spent significant effort trying to reproduce. I ran into an author at a party and he told me I should stay away, because it doesn't really work (!). I couldn't tell that by reading the paper, evals looked impressive.
As Lana put it in a recent talk she gave (with the absolutely glorious title “Non Serviam”), the problem is that Kids These Days have been conditioned to chase and optimize dense rewards instead of sparse rewards like in the olden days:
https://drive.google.com/file/d/1yjBpvvyxwHJvd99NdLk-d7io7dHtp1ZU/view?usp=drivesdk
Also, in the context of overproduction of CS papers, a couple of recent studies by Vladlen Koltun and collaborators:
1) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253397&trk=public_post_comment-text
2) https://arxiv.org/abs/2105.08089