17 Comments
User's avatar
Yuval Rabani's avatar

With 33000 submissions, they need to reject roughly 24000. If one rule rejects 800, they need just 30 disjoint rules to do it. That's a much simpler mechanism than a PC, assuming the rules and their number change annually to accommodate the authors learning curve and the number of submissions. Random rules hitting 800 would be nearly disjoint.

Dhruva Kashyap's avatar

For several years now, I have enjoyed this blog and hearing from Ben. I've always enjoyed it when he does the "old man screaming from the mountain" schtick. However, as a young researcher who began working in ML during my master's in India a few years ago, I can't help but feel broken. NeurIPS/ICML are treated as high honours, the highest bar to cross even now here (In NeurIPS 2025, ~10 papers were selected from research done in Indian academia), because we do not have the luxury of making a farm of GPUs that suck up the power of a small town go brrrr.

I have been taught by people I consider researchers of very high caliber, who say that it doesn't get better than this if you want to do rigorous machine learning research that treads the boundary between technical contribution and real-world impact (whatever that means). Sure, old men up mountains scream, "It wasn't like the old days", but that's just what they do. But when Ben says,

> I'm sorry, but it's already meaningless! ICML received over 33000 submissions. A random subset of 20-25% of these will be approved as "papers acceptable to go on one's CV." The process will churn forward. Everyone who attends the conference knows this process is impossibly bad...

I can't help but feel broken. For people who scrape by with little to no resources, and especially for people like me who just got here, I can't help but feel like the door is being slammed shut in our faces. Not by the "old men up mountains", but by what seem to be seedy bureaucrats who exploit the system to ensure that the ever-lengthening death march into the San Francisco startup economy has prompters to churn through.

If it really is a lottery, how can we ever feel like our work matters when we are constrained to a small sample size of submissions? And if "the Ministry is very scrupulous about following up and eradicating any error," then the price of entry seems to be access to incredible resources and the "correct" academic 'network'.

I can't help but feel like the party was ruined before we got there.

Emails are now written by LLMs, responses are written by LLMs, papers are written by LLMs, reviews are written by LLMs, and decisions are made by LLMs.

[This comment was not written by an LLM. My apologies for this rather long comment that may sound like a childish attempt at venting frustration. Do feel free to ignore it. If you have got this far, thanks?]

Lior Fox's avatar

> ICML received over 33000 submissions.

Every single problem is a direct consequence of this, and this is non-fixable

[also: "This plot uses default matplotlib colors, so it must be science" is gold]

Robert Wu's avatar

I have asked about fixes to Professor Recht in comment below. I hope he replies with good ideas.

Oliver Hinder's avatar

I've been an area chair for Neurips since 2023. It has been really depressing to see the fall in quality in both reviews and papers, even over such a short time span. As an AC I feel like I now have to intervene extraordinarily frequently in papers that all reviewers voted to accept but when I go to write my meta review it is clear from reading the reviews that the reviewers do not have the expertise to assess the paper (not infrequently all reviewers are undergrads/junior PhD) and then I carefully look at the paper there are major issues which mean it clearly shouldn't be accepted. I went to the town hall last year to try to get on a microphone and have a discussion with the committee and found out now everyone submits there questions to an ad hoc social media platform, allowing the PC to concisely dismiss questions about how to slow the out of control growth in the number of submissions with "we are working on it". Of course, then I got that AC email today and it says we going to make ACs do an initial meta review. Supposedly this is to make discussions more focused but I suspect it is because of author complaints about ACs rejecting papers that reviewers supported accepting. It is crazy the conferences focusing so much on how to make the reviewing systems "better" as if this is the core issue.

Ben Recht's avatar

Oof. I hear you. And I'm sorry.

One small clarification question. Last year, I grabbed 100 random names from the NeurIPS reviewer list and looked up their affiliations. The vast majority had PhDs. I found the percentage of PhD students was very low. I found no masters-level or undergraduate students. I'm curious how often you encounter very junior researchers as an AC, because I imagine their distribution across papers isn't uniform.

Oliver Hinder's avatar

I checked in open review (I can still see everything from neurips last year). About 40% did not yet have a PhD and about half of those were in their first to third year. There were no undergrads. Sorry, I clearly misremembered the number of junior PhDs/undergrads on my papers -- I distinctly remember there being undergrad(s) allocated to some of my papers but I maybe removed them. It was also a little unfair of me to single out PhD students (who often at least put a lot of effort into their reviews even if they lack experience). For example, some of the very senior folks who were roped into reviewing through the new mandatory review policies clearly skimmed the paper and wrote a generic review.

Maxim Raginsky's avatar

The problem is ducts, man. Ducts.

Maxim Raginsky's avatar

Now we have the whole country sectioned off. Can't make a move without a form!

Robert Wu's avatar

Hello! I find it hard to follow any proper constructive comment from the post. Are you saying we should only use LLM review for ICML? Or should we not have ICML? Then what should we do? Appreciate a response. Thanks.

Mark Johnson's avatar

Many readers are quite reasonably asking what we can do about this. I don't have a full answer, but I think decoupling publication at top conferences from career advancement is a necessary step.

Recruitment and promotions committees shouldn't just look at h-index, but actually read the candidates papers and decide whether they think the candidate's research is any good. Yes, different committees at different institutions would have different ideas about what counts as good research, but if this leads to more diversity in the field, I think this would be good.

A challenge is that at many non-top institutions, the administration doesn't trust their own faculty to make sound hiring decisions. Hiring on the basis of a statistic such as h-index is a bureaucratic way of avoiding blame.

Oliver Schulte's avatar

you hit the nail on the head. The fundamental concern is that we have conflated two things 1) credentialing people so that institutions and companies can make hiring and promotion decisions 2) epistemic concerns like validating methodology and results. Since career advancement is what driving both authors and program committees I don't see how we get out of this. More precisely, it does not help to say "this is not the way to promote scientific progress" because scientific progress is not the objective of the people running and using the system. In disciplines like accounting, professional certification happens through standardized exams.

Mark Johnson's avatar

The certification idea is worth thinking about.

The very best universities do want to hire the best people in the world, and I think they should put in the effort to read and evaluate the work of the candidates for their positions.

But the vast bulk of organisations are not the very best. If we did have some kind of professional certification, then those organisations could simply hire people with that certification if that certification actually aligned with the job requirements (a big if!).

One issue is that many second-tier institutions have hopes of becoming a top institution, so they ape the bureaucratic procedures used by the top institutions. So even if we had a credential that guaranteed the person was a good teacher and a fine PhD mentor (say) in a certain area, institutions might not use it because it indicates that the institution is second-tier.

Victoria Livingstone's avatar

As an editor in another academic field, I'm interested to see this. In the humanities, we don't get the same volume of submissions-- but the traditional model of peer review is completely broken. I don't think turning to LLMs is the answer, but the system does need an overhaul because the underlying problems (e.g., the shortage of reviewers) will not be rectified anytime soon.

Misha Belkin's avatar

My first conference paper received -- how should i say it? -- sub-optimal reviews (but somehow still got in). I do not long for the good old times.

Oliver Schulte's avatar

Authors are in an arms race for publication counts. The logic of the arms race is that everyone loses but you cannot escape it. Also, thank you for a great post. And for teaaching me the word "haruspicate".

Alex Tolley's avatar

>The epistemic concerns worry that peer review doesn’t properly weed out invalid papers. At least in the sciences, peer review is supposedly meta-epistemic, judging the validity of papers that aim to get at scientific knowledge, understanding, and explanation. Many studies have found the current state of peer review unfit for this task.<

I have certainly read that many reviews are cursory, barely more than spelling and grammar checking. We know from studies that about 1/3 of medical papers use teh wrong statistical measures. Unsurprising, IMO.

I was once asked if I would be a reviewer, even though I have minimal knowledge of the subject.

I can certainly understand that the explosion of "peer-reviewed" journals, many of minor readership, has become a burden to the reviewers.

IDK what the solution is, but I would have thought that AI-assisted reviews would be of help.

For a wider POV, I suggest that this peer review problem, and the possible AI solution, is the same type as "Moderation is impossible at scale".