24 Comments
User's avatar
Tom Dietterich's avatar

For the record, I liked your previous post. Broadening the scope of peer review to encompass all of the ways in which peers review each other was thought-provoking.

Expand full comment
Ben Recht's avatar

Thank you, Tom! Though now you have to change your name to Rob (I joke, I joke).

More seriously though, I do think I need to use a different term when describing the broader scope. I'll keep looking for a pithy one to use.

Expand full comment
Nihar B. Shah's avatar

"it doesn’t catch errors"

The description in Adam Mastroianni’s blog and the papers cited within may be misleading and underestimating the performance. The experiments in those papers (and others as well) insert *multiple* major errors in each paper. The papers and the blog then report the *fraction of errors* caught across all reviewers.

However, it is conceivable that when a reviewer reads a paper that they find badly flawed (e.g., paper says it is an RCT but in reality it is not), the reviewer may simply report this as a bad paper and stop reading ahead (to save their own time), thereby missing the subsequent errors. The fraction of errors found metric would thus be low.

An alternative metric is to check what fraction of reviews detected at least one error. I was able to get the dataset of the Schroter et al. 2008 paper from the very helpful Sara Schroter. It turns out that 90.94% of the reviews detect at least one of the major errors. That isn't too bad.

PS: I too am not in favor of the ML/AI conference review approach (https://researchonresearch.blog/2024/06/21/building-walls-in-academia-and-making-researchers-pay-for-it/), but my "positive" comments above are to add some clarity in interpreting of the peer-review literature.

Expand full comment
Ben Recht's avatar

You can rebut Mastroianni, but that's not my point. Applying quantitative social science methods to meta-analyze metascience gets us absolutely nowhere. Your nitpicking his citations is actually proving my point. No study proves anything. The point isn't that he's found dispositive studies. It's that no one has a good proof of peer review doing something uniquely valuable. I don't care how many papers on this topic with tortured statistics make it through peer review themselves.

And it's because you can't fix a social system by tabulating statistics and solving mechanism design problems. This technocratic railroading leads only to dead ends.

I'd encourage you to read Kevin Munger's post about scientific expertise more broadly. But I'd also like to share Kevin's comment on this blog which is relevant.

https://open.substack.com/pub/argmin/p/the-good-the-bad-and-the-science?r=p7ed6&utm_campaign=comment-list-share-cta&utm_medium=web&comments=true&commentId=121366259

Turning his argument toward participatory decision making about peer review: all academics are affected by it, everyone has a direct experience of it, and thus throwing around statistics doesn't negate everyone else's experience with the system.

Expand full comment
Mark Johnson's avatar

Do you have a proposal for a better system?

As far as organising conferences goes, just handling tens of thousands of submissions is going to be an administrative nightmare, no matter what system we use. I think there are some conferences or workshops that accept pretty much every submission; maybe that's what we should do?

As far as using publications as evidence of research quality: I've heard suggestions to move away from conferences to old-fashioned journals (which would be overwhelmed by the volume of submissions), or use citation counts / GScholar statistics, etc. I do think there's value in citation counts, but of course they are even easier to game than peer review.

I suspect a big factor is the enormous amount of money sloshing around the field right now; it's like a bright light driving the moths insane. The fact that a degree from a top US institution is life-changing for students from certain countries just amplifies this.

Expand full comment
Ambarish Chandra's avatar

Glad to see you return to bashing peer-review. One point about preprint servers though-- they can be just as prone to authoritarian control and groupthink. During COVID the preprint servers energetically cracked down on 'misinformatoon'. I had a paper that both SSRN and medrxiv refused to post citing misinformation. It was later published in the (peer-reviewed) Annals of Internal Medicine.

Expand full comment
Tom Dietterich's avatar

In any moderation or review process, there are false positives and false negatives. In any noisy review process, you can't drive the false positives to zero without also creating false negatives. The risk of propagating non-scientific COVID information led arXiv to hire a postdoc in infectious disease to check COVID-related submissions more carefully than usual. You may recall that there were papers claiming to apply computer vision to diagnose COVID from chest x-rays based on ridiculous data sets where the COVID and non-COVID cases were drawn from entirely different populations. One must balance the unfortunate false negatives, such as your paper, against the many bogus papers that were correctly rejected.

Expand full comment
Ben Recht's avatar

Adding to this conversation, albeit tangentially. I don't really understand the communal norm where a pdf hosted on a preprint server is considered more legitimate than a pdf on a personal website. In CS, this was a weird convention that emerged in the last decade or so, and I don't consider it a positive development.

Expand full comment
Daniel Spielman's avatar

I consider pdfs on preprint servers better than preprints hosted on personal websites because they keep people honest, and establish a record of who did what when. Bad actors can change a pdf on their web page without acknowledging the change. In this way, they can claim the results of others by forging date stamps, or cover up flaws in an early version of a paper.

I was very relieved when it became possible to post to arXiv.

Expand full comment
Ben Recht's avatar

Excellent point. This is definitely a tick in favor of a decentralized hosting service.

Although this time-stamping feature is definitely abused by aggressive actors. People often post some half-assed stuff to arxiv just to flag post.

Expand full comment
Tom Dietterich's avatar

Flag planting violates our rules, and we detect some of it and fail to detect the rest, of course. This (and the related problem of press releases masquerading as papers) tends to come more from companies rather than university researchers.

Conversely, some authors mark their paper as "Work in progress", which is actually a basis for rejection from arXiv. Submission to arXiv are supposed to be ready for peer review, not drafts. "arXiv is not github"

Expand full comment
Tom Dietterich's avatar

Yes. In the case of COVID, the worry was that the press and general public would misinterpret those papers. arXiv moderation is only checking that the submissions are syntactically "scientific papers" (as opposed to, say, grant proposals, PPTs, etc.). One of our guidelines is "It is not our job (as moderators) to protect your reputation (as an author)."

Expand full comment
Tom Dietterich's avatar

Totally agree from the point of view of status or correctness or importance. The main benefit is that for authors who submit the tex source, arXiv is likely to ensure that the paper is accessible for a long time. Web sites---especially of academics---are ephemeral.

Expand full comment
Yuval Rabani's avatar

I'm confused about this post. You say: "Those who say there is no better alternative don’t provide evidence when pressed." However, you don't propose even one candidate alternative after saying that. Isn't this by itself some evidence that there is no better alternative?

As you implied in your previous post, peer review has many functions. This is true even for peer review in its narrow sense that you discuss here. We use indicators of peer reviewed papers in hiring and promotion decisions. Perhaps formal academic ranks can be abolished, and even tenure can be abolished, but what about hiring and salary scales?

Expand full comment
Ben Recht's avatar

Wait, I'll list many candidate alternatives (even though people refuse to engage with these):

1. Write fewer papers.

2. Allow candidates to self-nominate a set of preprints as "published" in their merit reviews.

3. A micro-step: make CS conference acceptance based on a lower bar for reviewing (is this written clearly, does it pass a plausibility check, and is it not plagiarized?).

I've written about these and related ideas before (for example, https://www.argmin.net/p/youre-gonna-run-when-you-find-out, https://www.argmin.net/p/too-much-information), but I am noting to myself to repeat these basic suggestions every time I raise this topic.

Expand full comment
Yuval Rabani's avatar

How do these do away with peer review?

Expand full comment
Nihar B. Shah's avatar

1. I agree with that. However, actually making this a reality needs a suitable incentive structure around it. When hiring/internship/fellowship committees or even governments count papers, or provide monetary rewards for each paper, it incentivizes writing of more papers. Do you have thoughts on how to change the overall culture?

My personal preference is to significantly increase acceptance rates, which will reduce the value of an individual paper and hopefully lead to higher-quality but fewer papers. (https://drive.google.com/file/d/1Yw1bg1hHj3ydjAnkxhKPOYWIB0tqqBJA/view)

2. Many schools already do that by asking job/promotion applicants for their "three most meritorious papers". Do you think that has made a difference in the approach taken towards research and publishing?

Also, one may envisage that even this criterion leads to writing more papers. Since it is uncertain a priori which completed project will really take off or have large impact, writing more papers gives one better shot at having something "impactful"?

3. Do you mean lower bar for reviewing but continue with 20% acceptance rates?

In any case, the bar for reviewing in these conferences is already quite low!

Expand full comment
Matt Hoffman's avatar

For whatever reason, it seems like very few people evaluate these institutions in terms of academia's teaching mission. I find it clarifying (and comforting) to look at the peer-review process not primarily as a machine for advancing science, but as a technique for training junior researchers and practitioners.

Writing boring papers may not advance the field, but it _is_ good training. So is reviewing papers (even boring ones). So is reading and responding to anonymous reviews of your papers (at least sometimes).

Expand full comment
Ben Recht's avatar

We'll have to agree to disagree here. I question the pedagogical soundness of any method that creates a massive burden on people outside the classroom/lab. Most fields have figured out ways to educate PhD students without conscripting them to write dozens of papers a year.

Expand full comment
Matt Hoffman's avatar

By massive burden, you mean the reviewing load? I feel like when I was a grad student I learned a lot by reviewing papers (and getting my papers reviewed), and now that I'm more senior I see reviewing mostly as a form of anonymous volunteer mentorship. Reviewing is a burden, but I guess I would hope that your students get something out of it (feedback, practice critically evaluating others' work and articulating those evaluations) comparable in value to what they (and you) put into it. But what do I know, I don't have any students—maybe they don't (or don't anymore).

Expand full comment
Ben Recht's avatar

The problem is that we consider reviewing mandatory, not voluntary. The compulsory part, without compensation, necessarily makes it a burden.

There are lots of ways we can teach critical appraisal and facilitate volunteer mentorship that don't involve a mandatory system of anonymous pre-publication review.

Expand full comment
Alex Tolley's avatar

Is there any Rx proposal that threads the needle for a better science outcome that doesn't create problems but also keeps a check on the noise of misinformation and likely AI slop?

Some problems, such as peer review, might be partially solved by information technology, including the "pixie dust" du jour, AI, just as it was the solution to the issue of physically tracking down and making copies of papers from citation indexes in the 1970s. [Certainly, reviews should never be acceptable for checking spelling and grammar errors!] Also, whatever happened to the idea of having negative results published and associated hypotheses, if only to prevent wasteful repetition? [It should be trivial to set up a searchable archive of negative experiments and experiments with unreproducible results.]

Expand full comment
Ben Recht's avatar

Why can't the proposal just be "everybody, write less?"

I wrote a bit about why that doesn't work and yet we can all try to be better here: https://www.argmin.net/p/youre-gonna-run-when-you-find-out

Expand full comment
Alex Tolley's avatar

Wasn't it in Newton's time that one could still hope to read all the scientific papers that were published? The problem isn't volume, but rather finding the wheat in the chaff. When I graduated in the early 1970s, there were 2 topline science journals, Science and Nature, that supposedly published the most worthy work. The popular science media was slim. In Britain, I bought the New Scientist and Scientific American (now sadly dumbed down from those days). Today, the journal catalogs are vast and getting ever larger. The popular science media is also huge.

I don't think this is bad, if anything, it is good. More work is being done than ever before, from truly important groundbreaking science and discoveries, to popular explanations of science, both big and small. Even amateurs can find ways to get some small findings disseminated.

No, the problem is analogous to my observation when we had to manually work through citation indexes, find journal papers in the library stacks, during university library opening hours. That was the only way to find relevant prior work. As for finding new things, that was the printed journals' job to identify.

What I want is a good way to ask questions about subjects and get good information on what has been done, with what results. Google scholar was sort of useful, and certainly better than manual library searches. Domain-specific AIs are better, but still hallucinate. But to me, this is the path forward. AIs to construct consensus, but show the differences, like meta-analyses. Provide the papers for details. An ocean of information that can be quickly searched for the nuggets that you need. So publish, publish, and publish. Just don't create garbage.

Expand full comment