Anca Dragan has a cool name, an impressive CV and an important job. While many roboticists focus on making AI better, faster and smarter, Dragan is also concerned about robot quality control. In anticipation of robots moving into every area of our lives, she wants to ensure our interactions with robots are positive ones. The computer scientist and robotics engineer is a principal investigator with UC Berkeley’s Center for Human-Compatible AI. “One particular area of interest is the problem of value alignment,” says Dragan. “How do you ensure that an artificially intelligent agent–be it a robot a few years from now or a much more capable agent in the future–how do you make sure that these agents optimize the right objectives? How do we teach them to optimize what we actually want optimized?”
Preventing undesirable robot behavior is becoming a priority as robots get more intelligent, more nimble and increasingly autonomous. It’s something a lot of people feel uneasy about, even if most of us don’t know enough about AI to put our concerns into quite the right words.
Finding the right words and dealing with communication breakdowns are at the very heart of a problem Dragan’s trying to tackle. She’s developing a new paradigm for machine learning to teach robots to do what we want–even if we can’t articulate what we really want.
That’s literally not what I want.
With traditional AI learning models, the robot designer programs a piece of code that specifies a utility function. The robot moves through the world automated by this code, its directive. But capturing desirable behavior in a piece of code leaves robots considerably error-prone. We humans skip over a lot of important information when we specify what we want. For instance, you may want a windfall of cash with zero effort. Lacking the moral framework encoded in our DNA and reinforced by our social norms, the robot could determine the best way to get you what you want is to kill your parents for the inheritance. Endless possible sci-fi horror movie scenarios play out without the benefit of context, guidance and real-time stopgaps to interrupt a baby robot’s perversion of logic.
With a little imagination, Dragan and her fellow computer scientists can easily forecast and prevent more obvious penny dreadful outcomes. It’s trickier to account for undesirable robot outcomes hiding in our blind spots.
We’ve already felt the fallout from an AI learning curve.
The threat isn’t some far-flung future concern, either. When it comes to AI, Dragan is mostly concerned about unintended consequences, some of which we’ve already experienced. “Social media feeds optimize for keeping you hooked, but that ends up biasing your political views even further,” says Dragan. “Or we use algorithms we believe are more accurate to make important decisions, but they turn out to be biased. Or, we train learning systems to care about accuracy, treating every mistake as just as important when, in fact, some mistakes have much bigger implications in practice than others. A lot of these issues stem from us not being able to specify the objective we really want, and it’s not our fault–it’s really hard sometimes. That’s why I believe as we build tools that can better optimize any objective, we need to also build tools that work with people to figure out what the right objectives are.”
Teaching machines to second-guess us.
In Dragan’s new learning paradigm, a robot collects clues from people and its environment about what we might really want. “We have these internal states that robots can’t directly observe and they have to estimate,” says Dragan. Especially as directives become more complex.
Now, when a designer specifies a utility function, the robot doesn’t get obsessive about its objective. Instead, it considers the utility function a crucial data point about the desired outcome. “We work with probability distributions, instead of just taking the utility function that was specified and sticking to it,” says Dragan. The robot determines that the specified goal is likely desired but it also considers various other options the designer may have meant. At this point, the robot can either act conservatively, or it can try something new that Dragan and her team are working on: getting the robot to go back to the designer and ask questions like, “hey, I just want to double-check if you meant what you said, or maybe it’s this other thing that you want?”
Dragan says it’s still a work in progress, but when her team began using the new learning system, they were surprised to see robots quickly develop a fairly good understanding of what humans really want.
A more intelligent “species”?
“Part of the concern is we’re going to have these very capable agents out there,” says Dragan. “And what does it mean to be a human in a world where there’s, in essence, a more intelligent species–but it’s not a more intelligent species, per se, because it’s an intelligent species that we humans control. ”
If controlling a more intelligent species sounds like pure hubris, Dragan understands, even though it may simply boil down to yet another example of our inability to find the right words. Robotics engineers frequently experience a breakdown in metaphor when they describe their work to the public. Behaviors have always been ascribed to agents with biological drives, and robots don’t have biological drives. So we may need a vernacular to talk about non-biological superintelligent agents who display highly skilled behaviors. “Their drives are a utility function or objective that we specify.” And for Dragan and her colleagues at the Center for Human-Compatible AI, the big challenge is how to get robot objectives right.