13 Comments

I had some tangential info about the competition as it was going on, as I know an anthropologist who was doing some work with Netflix at the time. What I will note is that the contextless recommendation setups (i.e. content doesn't matter) didn't really do much for users. And this basically leads us to where we are with streaming today. Oversimplifying to keep it short: "finding something people will watch" is very different to "helping people find something they want" - but the latter only matters when there is real competition (which Netflix at the time, did not suffer from).

Expand full comment
author

Very interesting. But does Netflix help people find what they want now? If anything, they seem to be leaning into content generation and neglecting the rest of the catalog.

Expand full comment

Yes, sorry, I wasn't clear - Netflix does not help people find what they want now and neither do any of the other big services particularly. People are particularly beginning to notice that Amazon, Disney and Paramount have big catalogs, with interesting things in them, that are really hard to find.

Expand full comment
Aug 24, 2023Liked by Ben Recht

Interesting rant! A relevant dimension to the Netflix prize aftermath is that quirky fact that movie rental/viewing records are one of the few things that actually have a privacy law in the United States. See the wikipedia page for the so-called Video Privacy Protection Act (VPPA) and the bits about Robert Bork. https://en.wikipedia.org/wiki/Video_Privacy_Protection_Act

I am not a lawyer, but I've come to understand that the VPPA was the main legal hook for many of the lawsuits against Facebook Beacon as well as many other US privacy lawsuits in the modern era, most of which have settled. To be a bit simplistic, Facebook had a privacy issue and it sorta didn't matter under US law, but for the fact that it was leaking video viewing history.

Expand full comment
author

That's a fascinating observation! The only way to get at the tech companies is through anachronistic laws set up to police other industries. Similar to how FB got in trouble for violating the Fair Housing Act.

Expand full comment
Aug 24, 2023Liked by Ben Recht

Excellent rant. Data is more valuable than ML circus tricks, and closed data is better business than open data. No one should be surprised here. The idea of the recommender system is likely more powerful for business than the actual recommender system. Similar statements can be made involving big pharma and their products. So close to getting red-pilled proper! Good luck to us all.

Expand full comment
author

I'm doing my part to red pill us all.

I do often wonder what the value of recommender systems are. Similar to how we value advertising...

Expand full comment
Aug 23, 2023Liked by Ben Recht

The Netflix Prize is such a fascinating case study! To the list of lessons, I would add that it taught us to view recommendation and collaborative filtering as a machine learning problem, and specifically as a prediction problem. Folks working on information retrieval knew about the power of user behavior as far back as the 80s; as far back as 2005 collaborative filtering was considered pervasive: https://www.economist.com/technology-quarterly/2005/03/12/united-we-find.

A few years ago I spent some time reading the early collaborative filtering papers, and one of my main impressions is that they were distinctly *not* machine learning papers. While they often included evaluations of predictive accuracy, they also spent a lot of time discussing the broader system and why accuracy matters to users. Many of the early papers envisioned users with agency, actively looking for new content or even hand-designing their own "collaborative filters." (One exception is a 2003 paper from Amazon that cites "impulse buys" as a motivation.) Maybe it all has to do with commercialization? MovieLens remains as a non-commercial recommendation platform and provides open datasets, but there isn't much out there like it.

Expand full comment
author

You should write a survey about those early times in recommender systems! Definitely a history worth revisiting.

Expand full comment

Re: healthcare data sharing: I don't know how the nightingale project has been going: https://www.nature.com/articles/s41591-022-01804-4

Some healthcare datasets from federally funded clinically trials are fairly requestable by researchers (in contrast to corporate data in general); on the other hand it sounds like hospital data, not so much.

Expand full comment
author

I agree that sometimes you can get this for clinical trials, but it's not true for all of them. I personally have had the experience of being completely denied access to any data from a trial that was for the FDA approval of a particular medical device.

When I say data sharing, I mean every paper published in a medical journal should come with the case reports, tabulated data, and statistics code.

Expand full comment

Good point, the ecosystem is still far from sharing data by default! I think the Nightingale project is more about opportunistically releasing datasets as well, rather than universally.

Expand full comment

There haven't been more competitions?

https://www.kaggle.com/competitions

Expand full comment