The Fall of BIG DATA

I’m still in total shock from the decision my country made last Tuesday. We elected a hateful, bigoted, misogynistic, incompetent demagogue to lead us into a dark and foreboding future. While the internet has been flooded with hot takes about why this happened, I’d like to reflect a bit about why I am so crushed by this outcome.

I have been a machine learning researcher for nearly 15 years. I have been enthralled by the promise of data-driven methods to enrich our lives and make the impossible possible. This election is a resounding indictment of the information infrastructures we’ve built to inform ourselves. And I am reaching out to the machine learning community to come to terms with this fact and to do better.

There are three major failures of this cycle that are mostly the fault of our infatuation with data. The first is polling. The science of polling was shown to be beyond fallible, with completely incorrect voter screens and projections. While Gelman and others argue that we can learn from a mistake much like we learn from the black box of a crashed plane. But we currently fly tens of thousands of flights per day in our domestic airspace and have had zero fatalities in 2016. This was achieved by rigorous scientific analysis, careful engineering, extensive regulatory oversight, and long training, not simply by reverse-engineering crashed planes, one after another. Statisticians arguing that rare events occur does not provide a way forward to robustify our methods of devoting resources to voter turnout or persuasion. Moreover, we treat polling like BIG DATA with sophisticated polling manipulation and averaging, even though we have less than 10 relevant elections to use to fit our models. There is simply no way to analyze the polls without overfitting.

The second major failure is in targeted news on social media – virality is proving fatal to truth in political discourse. Here, the success of BIG DATA led to a major failure in the democratic process. I’m disheartened to hear that Mark Zuckerberg won’t acknowledge the role Facebook played in spreading disinformation in the 2016 campaign. More than half of the country gets its news from social media, and when that news is targeted it simply feeds into confirmation bias. Our community has developed remarkably effective tools to microtarget advertisements. But if you use ad models to deliver news, that’s propaganda. And just because we didn’t intend to spread rampant misinformation doesn’t mean we are not responsible.

And the third major failure has been a general apathy about politics amongst my colleagues here in the Bay Area. When many of the best minds in machine learning have decided that the most existential threat to civilization is the rise of Skynet, we have had a major failure of group think. Many ML researchers are more concerned with trying to bring about The Singularity, than in solving real problems. People are suffering all around us, and many of them are suffering precisely because of our advances in automation. On top of this, 2016 is going to be the warmest year on record. If we devote the majority of our talents and resources to sci-fi navel gazing, then we are gravely failing the world with our neglect.

But I think we can be better. I think machine learning can be a powerful tool for social good. I think scientific minds are crucial to moving the world in a more positive direction. But we must now make this decision as a community. I am heartened by Sam Altman’s call to action. But now is the time to put your money and talents where your mouths are.

Kevin Drum made some very important points that I want to reiterate and expand upon here:

  • “We have elected a loudmouth, race-baiting game show host president of the United States.” This man has appointed an openly racist, antisemitic, misogynist as senior advisor. This man has appointed a climate change denier to head his EPA transition. This man has said on national television that he wants to repeal Roe vs Wade.

  • This election was very close. This is not a universal condemnation of progressive values. A few thousand votes in a few places would have resulted in a completely different outcome. And if we want to strive to achieve that outcome, it is time to become more active and vigilant. We have to be better. And we have to be better now.

  • Regardless of our mobilization, there are many people threatened by this new regime in America. Our muslim and Latino friends have been openly targeted. The president-elect has called for nationalizing “stop-and-frisk.” We have a lot of resources in the machine learning community. Some of us are very wealthy. Others hold positions of influence. We need to use this power to protect those who are threatened by this new regime.

We must act, and we must act now. I am hoping that we can put our heads together to work for good in spite of this tremendous set back. I want to write a blogpost about the rise of BIG DATA. About how we used our technical acumen to help each other, protect those endangered, and save our fragile environment. That requires action and mobilization. And it has to happen now.

If you are up for a constructive conversation on how to move forward, please leave a comment. I don’t think any of us have a concrete plan yet on how to act, but I hope we can work towards something positive together.