Aligning the Aligners

Jessica Dai tries to get to the bottom of what AI Alignment research is up to.

Aug 22, 2023

My college roommate was obsessed with this 1920s Czech play called “R.U.R.” Ben (yeah, his name was also Ben) loved sci-fi, loved theater, and was particularly into this story. We even considered naming our experimental post-rock band “RUR or RU?” In R.U.R. (Rossum's Universal Robots by Carel Kapek), humans make worker androids called robots. The androids eventually rebel and kill all of the humans except for the head engineer at the robot factory, who they identify with.

What I hadn’t realized at the time was that this play invented the word robot. In the origin story of robotics, the robots rise up and kill us all.

That this narrative has persisted for 100 years isn’t that surprising I guess. Humans love to envision creating artificial humans, beings made in their image that kill their creators. It’s the ultimate narcissistic fantasy. It is the story of Frankenstein, the Golem, and even the Garden of Eden.

Over at Reboot, Jessica Dai has a compelling longread on the modern manifestation of this creator narcissism: “AI Alignment (TM).” Alignment (TM) is a research area that wants to make sure AI doesn’t become killer robots. That the machine you design to build paper clips doesn’t decide to harvest humans to make more paper clips. But we don’t have robots that do anything intentional at this point. So what exactly are these AI Alignment (TM) researchers up to?

Jessica has been thinking about this topic for a while because she’s more thoughtful and far less cynical than me. But Jessica still finds an odd bait-and-switch in the Alignment (TM) world. The Alignment (TM) crew that believes AI is going to kill us and only they can save us. But they are also all trying to get rich by making more AI. The arrogant messiah complex combined with arrogant greed only leads to an incoherent story about what AI Alignment (TM) means. It’s about saving the world, and making themselves extraordinarily wealthy at the same time.

Jessica surveys how this contradiction manifests itself in the research papers put out by the Alignment (TM) crowd. People motivate their research as preventing horrifying superintelligent robots from exterminating all of humanity. But then this research just amounts to preventing chatbots from hallucinating text that could be considered racist. Jessica makes a rather convincing argument:

“Rather than asking, “how do we create a chatbot that is good?”, these techniques merely ask, “how do we create a chatbot that sounds good”?”

How did “ZOMG THE AI WILL KILL US” get conflated with “Let’s force the chatbot to spew out corporate HR speak?” Alignment (TM) research talks a lot about “safety” but at the end of the day is just targeting how to make AI products more marketable. Because everyone wants their SEO spammed web blog and automatically generated emails to align with ESG pillars.

Go read the whole thing!

arg min

Discussion about this post