Can we stop AI outsmarting humanity?

The spectre of superintelligent machines doing us harm is not just science fiction, technologists say – so how can we ensure AI remains ‘friendly’ to its makers?

It began three and a half billion years ago in a pool of muck, when a molecule made a copy of itself and so became the ultimate ancestor of all earthly life. It began four million years ago, when brain volumes began climbing rapidly in the hominid line.

Fifty thousand years ago with the rise of Homo sapiens sapiens.

Ten thousand years ago with the invention of civilization.

Five hundred years ago with the invention of the printing press.

Fifty years ago with the invention of the computer.

In less than thirty years, it will end.

Jaan Tallinn stumbled across these words in 2007, in an online essay called Staring into the Singularity. The “it” was human civilisation. Humanity would cease to exist, predicted the essay’s author, with the emergence of superintelligence, or AI, that surpasses human-level intelligence in a broad array of areas.

Tallinn, an Estonia-born computer programmer, has a background in physics and a propensity to approach life like one big programming problem. In 2003, he co-founded Skype, developing the backend for the app. He cashed in his shares after eBay bought it two years later, and now he was casting about for something to do. Staring into the Singularity mashed up computer code, quantum physics and Calvin and Hobbes quotes. He was hooked.

Tallinn soon discovered that the author, Eliezer Yudkowsky, a self-taught theorist, had written more than 1,000 essays and blogposts, many of them devoted to superintelligence. He wrote a program to scrape Yudkowsky’s writings from the internet, order them chronologically and format them for his iPhone. Then he spent the better part of a year reading them.

The term artificial intelligence, or the simulation of intelligence in computers or machines, was coined back in 1956, only a decade after the creation of the first electronic digital computers. Hope for the field was initially high, but by the 1970s, when early predictions did not pan out, an “AI winter” set in. When Tallinn found Yudkowsky’s essays, AI was undergoing a renaissance. Scientists were developing AIs that excelled in specific areas, such as winning at chess, cleaning the kitchen floor and recognising human speech. Such “narrow” AIs, as they are called, have superhuman capabilities, but only in their specific areas of dominance. A chess-playing AI cannot clean the floor or take you from point A to point B. Superintelligent AI, Tallinn came to believe, will combine a wide range of skills in one entity. More darkly, it might also use data generated by smartphone-toting humans to excel at social manipulation.

Reading Yudkowsky’s articles, Tallinn became convinced that superintelligence could lead to an explosion or breakout of AI that could threaten human existence – that ultrasmart AIs will take our place on the evolutionary ladder and dominate us the way we now dominate apes. Or, worse yet, exterminate us.

After finishing the last of the essays, Tallinn shot off an email to Yudkowsky – all lowercase, as is his style. “i’m jaan, one of the founding engineers of skype,” he wrote. Eventually he got to the point: “i do agree that … preparing for the event of general AI surpassing human intelligence is one of the top tasks for humanity.” He wanted to help.

When Tallinn flew to the Bay Area for other meetings a week later, he met Yudkowsky, who lived nearby, at a cafe in Millbrae, California. Their get-together stretched to four hours. “He actually, genuinely understood the underlying concepts and the details,” Yudkowsky told me recently. “This is very rare.” Afterward, Tallinn wrote a check for $5,000 (£3,700) to the Singularity Institute for Artificial Intelligence, the nonprofit where Yudkowsky was a research fellow. (The organisation changed its name to Machine Intelligence Research Institute, or Miri, in 2013.) Tallinn has since given the institute more than $600,000.

The encounter with Yudkowsky brought Tallinn purpose, sending him on a mission to save us from our own creations. He embarked on a life of travel, giving talks around the world on the threat posed by superintelligence. Mostly, though, he began funding research into methods that might give humanity a way out: so-called friendly AI. That doesn’t mean a machine or agent is particularly skilled at chatting about the weather, or that it remembers the names of your kids – although superintelligent AI might be able to do both of those things. It doesn’t mean it is motivated by altruism or love. A common fallacy is assuming that AI has human urges and values. “Friendly” means something much more fundamental: that the machines of tomorrow will not wipe us out in their quest to attain their goals.

Last spring, I joined Tallinn for a meal in the dining hall of Cambridge University’s Jesus College. The churchlike space is bedecked with stained-glass windows, gold moulding, and oil paintings of men in wigs. Tallinn sat at a heavy mahogany table, wearing the casual garb of Silicon Valley: black jeans, T-shirt and canvas sneakers. A vaulted timber ceiling extended high above his shock of grey-blond hair.

At 47, Tallinn is in some ways your textbook tech entrepreneur. He thinks that thanks to advances in science (and provided AI doesn’t destroy us), he will live for “many, many years”. When out clubbing with researchers, he outlasts even the young graduate students. His concern about superintelligence is common among his cohort. PayPal co-founder Peter Thiel’s foundation has given $1.6m to Miri and, in 2015, Tesla founder Elon Musk donated $10m to the Future of Life Institute, a technology safety organisation in Cambridge, Massachusetts. But Tallinn’s entrance to this rarefied world came behind the iron curtain in the 1980s, when a classmate’s father with a government job gave a few bright kids access to mainframe computers. After Estonia became independent, he founded a video-game company. Today, Tallinn still lives in its capital city – also called Tallinn – with his wife and the youngest of his six kids. When he wants to meet with researchers, he often just flies them to the Baltic region.

His giving strategy is methodical, like almost everything else he does. He spreads his money among 11 organisations, each working on different approaches to AI safety, in the hope that one might stick. In 2012, he cofounded the Cambridge Centre for the Study of Existential Risk (CSER) with an initial outlay of close to $200,000.

Jaan Tallinn at Futurefest in London in 2013.

Existential risks – or X-risks, as Tallinn calls them – are threats to humanity’s survival. In addition to AI, the 20-odd researchers at CSER study climate change, nuclear war and bioweapons. But, to Tallinn, those other disciplines “are really just gateway drugs”. Concern about more widely accepted threats, such as climate change, might draw people in. The horror of superintelligent machines taking over the world, he hopes, will convince them to stay. He was visiting Cambridge for a conference because he wants the academic community to take AI safety more seriously.

At Jesus College, our dining companions were a random assortment of conference-goers, including a woman from Hong Kong who was studying robotics and a British man who graduated from Cambridge in the 1960s. The older man asked everybody at the table where they attended university. (Tallinn’s answer, Estonia’s University of Tartu, did not impress him.) He then tried to steer the conversation toward the news. Tallinn looked at him blankly. “I am not interested in near-term risks,” he said.

Tallinn changed the topic to the threat of superintelligence. When not talking to other programmers, he defaults to metaphors, and he ran through his suite of them: advanced AI can dispose of us as swiftly as humans chop down trees. Superintelligence is to us what we are to gorillas.

An AI would need a body to take over, the older man said. Without some kind of physical casing, how could it possibly gain physical control?

Tallinn had another metaphor ready: “Put me in a basement with an internet connection, and I could do a lot of damage,” he said. Then he took a bite of risotto.

Every AI, whether it’s a Roomba or one of its potential world-dominating descendants, is driven by outcomes. Programmers assign these goals, along with a series of rules on how to pursue them. Advanced AI wouldn’t necessarily need to be given the goal of world domination in order to achieve it – it could just be accidental. And the history of computer programming is rife with small errors that sparked catastrophes. In 2010, for example, when a trader with the mutual-fund company Waddell & Reed sold thousands of futures contracts, the firm’s software left out a key variable from the algorithm that helped execute the trade. The result was the trillion-dollar US “flash crash”.

The researchers Tallinn funds believe that if the reward structure of a superhuman AI is not properly programmed, even benign objectives could have insidious ends. One well-known example, laid out by the Oxford University philosopher Nick Bostrom in his book Superintelligence, is a fictional agent directed to make as many paperclips as possible. The AI might decide that the atoms in human bodies would be better put to use as raw material.

A man plays chess with a robot designed by Taiwan’s Industrial Technology Research Institute (ITRI) in Taipei in 2017.

Tallinn’s views have their share of detractors, even among the community of people concerned with AI safety. Some object that it is too early to worry about restricting superintelligent AI when we don’t yet understand it. Others say that focusing on rogue technological actors diverts attention from the most urgent problems facing the field, like the fact that the majority of algorithms are designed by white men, or based on data biased toward them. “We’re in danger of building a world that we don’t want to live in if we don’t address those challenges in the near term,” said Terah Lyons, executive director of the Partnership on AI, a technology industry consortium focused on AI safety and other issues. (Several of the institutes Tallinn backs are members.) But, she added, some of the near-term challenges facing researchers, such as weeding out algorithmic bias, are precursors to ones that humanity might see with super-intelligent AI.

Tallinn isn’t so convinced. He counters that superintelligent AI brings unique threats. Ultimately, he hopes that the AI community might follow the lead of the anti-nuclear movement in the 1940s. In the wake of the bombings of Hiroshima and Nagasaki, scientists banded together to try to limit further nuclear testing. “The Manhattan Project scientists could have said: ‘Look, we are doing innovation here, and innovation is always good, so let’s just plunge ahead,’” he told me. “But they were more responsible than that.”

Tallinn warns that any approach to AI safety will be hard to get right. If an AI is sufficiently smart, it might have a better understanding of the constraints than its creators do. Imagine, he said, “waking up in a prison built by a bunch of blind five-year-olds.” That is what it might be like for a super-intelligent AI that is confined by humans.

The theorist Yudkowsky found evidence this might be true when, starting in 2002, he conducted chat sessions in which he played the role of an AI enclosed in a box, while a rotation of other people played the gatekeeper tasked with keeping the AI in. Three out of five times, Yudkowsky – a mere mortal – says he convinced the gatekeeper to release him. His experiments have not discouraged researchers from trying to design a better box, however.

The researchers that Tallinn funds are pursuing a broad variety of strategies, from the practical to the seemingly far-fetched. Some theorise about boxing AI, either physically, by building an actual structure to contain it, or by programming in limits to what it can do. Others are trying to teach AI to adhere to human values. A few are working on a last-ditch off-switch. One researcher who is delving into all three is mathematician and philosopher Stuart Armstrong at Oxford University’s Future of Humanity Institute, which Tallinn calls “the most interesting place in the universe.” (Tallinn has given FHI more than $310,000.)

Armstrong is one of the few researchers in the world who focuses full-time on AI safety. When I met him for coffee in Oxford, he wore an unbuttoned rugby shirt and had the look of someone who spends his life behind a screen, with a pale face framed by a mess of sandy hair. He peppered his explanations with a disorienting mixture of popular-culture references and math. When I asked him what it might look like to succeed at AI safety, he said: “Have you seen the Lego movie? Everything is awesome.”

The philosopher Nick Bostrom.

One strain of Armstrong’s research looks at a specific approach to boxing called an “oracle” AI. In a 2012 paper with Nick Bostrom, who co-founded FHI, he proposed not only walling off superintelligence in a holding tank – a physical structure – but also restricting it to answering questions, like a really smart Ouija board. Even with these boundaries, an AI would have immense power to reshape the fate of humanity by subtly manipulating its interrogators. To reduce the possibility of this happening, Armstrong proposes time limits on conversations, or banning questions that might upend the current world order. He also has suggested giving the oracle proxy measures of human survival, like the Dow Jones industrial average or the number of people crossing the street in Tokyo, and telling it to keep these steady.

Ultimately, Armstrong believes, it could be necessary to create, as he calls it in one paper, a “big red off button”: either a physical switch, or a mechanism programmed into an AI to automatically turn itself off in the event of a breakout. But designing such a switch is far from easy. It is not just that an advanced AI interested in self-preservation could prevent the button from being pressed. It could also become curious about why humans devised the button, activate it to see what happens, and render itself useless. In 2013, a programmer named Tom Murphy VII designed an AI that could teach itself to play Nintendo Entertainment System games. Determined not to lose at Tetris, the AI simply pressed pause – and kept the game frozen. “Truly, the only winning move is not to play,” Murphy observed wryly, in a paper on his creation.

For the strategy to succeed, an AI has to be uninterested in the button, or, as Tallinn put it: “It has to assign equal value to the world where it’s not existing and the world where it’s existing.” But even if researchers can achieve that, there are other challenges. What if the AI has copied itself several thousand times across the internet?

The approach that most excites researchers is finding a way to make AI adhere to human values– not by programming them in, but by teaching AIs to learn them. In a world dominated by partisan politics, people often dwell on the ways in which our principles differ. But, Tallinn told me, humans have a lot in common: “Almost everyone values their right leg. We just don’t think about it.” The hope is that an AI might be taught to discern such immutable rules.

In the process, an AI would need to learn and appreciate humans’ less-than-logical side: that we often say one thing and mean another, that some of our preferences conflict with others, and that people are less reliable when drunk. Despite the challenges, Tallinn believes, it is worth trying because the stakes are so high. “We have to think a few steps ahead,” he said. “Creating an AI that doesn’t share our interests would be a horrible mistake.”

On his last night in Cambridge, I joined Tallinn and two researchers for dinner at a steakhouse. A waiter seated our group in a white-washed cellar with a cave-like atmosphere. He handed us a one-page menu that offered three different kinds of mash. A couple sat down at the table next to us, and then a few minutes later asked to move elsewhere. “It’s too claustrophobic,” the woman complained. I thought of Tallinn’s comment about the damage he could wreak if locked in a basement with nothing but an internet connection. Here we were, in the box. As if on cue, the men contemplated ways to get out.

Tallinn’s guests included former genomics researcher Seán Ó hÉigeartaigh, who is CSER’s executive director, and Matthijs Maas, an AI researcher at the University of Copenhagen. They joked about an idea for a nerdy action flick titled Superintelligence v Blockchain!, and discussed an online game called Universal Paperclips, which riffs on the scenario in Bostrom’s book. The exercise involves repeatedly clicking your mouse to make paperclips. It’s not exactly flashy, but it does give a sense for why a machine might look for more expedient ways to produce office supplies.

Eventually, talk shifted toward bigger questions, as it often does when Tallinn is present. The ultimate goal of AI-safety research is to create machines that are, as Cambridge philosopher and CSER co-founder Huw Price once put it, “ethically as well as cognitively superhuman”. Others have raised the question: if we don’t want AI to dominate us, do we want to dominate AI? In other words, does AI have rights? Tallinn believes this is needless anthropomorphising. It assumes that intelligence equals consciousness – a misconception that annoys many AI researchers. Earlier in the day, CSER researcher José Hernández-Orallo joked that when speaking with AI researchers, consciousness is “the C-word”. (“And ‘free will’ is the F-word,” he added.)

In the cellar, Tallinn said that consciousness is beside the point: “Take the example of a thermostat. No one would say it is conscious. But it’s really inconvenient to face up against that agent if you’re in a room that is set to negative 30 degrees.”

Ó hÉigeartaigh chimed in. “It would be nice to worry about consciousness,” he said, “but we won’t have the luxury to worry about consciousness if we haven’t first solved the technical safety challenges.”

People get overly preoccupied with what superintelligent AI is, Tallinn said. What form will it take? Should we worry about a single AI taking over, or an army of them? “From our perspective, the important thing is what AI does,” he stressed. And that, he believes, may still be up to humans – for now.

Via The Guardian