AI can generate interactive virtual worlds based on simple videos

Nvidia’s new AI represents a major leap forward in graphics generation based on neural networks.

Crafting an interactive virtual world of the kind found in many modern video games is a labor-intensive process that can require years of work, hundreds of people, and millions of dollars. Soon, some of that work may be done by machines.

Computer hardware company Nvidia, which specializes in graphics cards, announced on Monday that it developed a new AI model that can take video of the real world and use it to generate a realistic and interactive virtual world. According to Nvidia, its new AI could be used to drastically lower the cost of generating virtual environments, which will be particularly useful in the video game and film industries.

But that’s not all—the AI can also be used to do things like map simple sketch drawings of facial expressions onto videos of actual people, making them appear to make a face they never did in real life. The same neural net can also be used to predict and generate the future of a video sequence based on the current frame. This ability is crucial for the future of self-driving car technology, because it can help predict the behaviour of, say, pedestrians at a crosswalk.

“This is the first time we’ve combined machine learning and computer graphics to do image generation using deep networks,” Ming-Yu Liu, a researcher at Nvidia, said in a video produced by the company to promote the technology.

The “deep networks” referred to by Liu are generative neural networks, a type of computing architecture loosely based on the human brain that “learns” to produce new things after being tuned to recognize patterns in a large amount of input data.

The Nvidia researchers trained their AI using videos of cars driving through cities in the real world. The researchers then extracted the high-level features of these videos—such as whether a scene is depicting other cars, trees, or a building—and used this information to teach the AI how to recognize objects.

Once the neural network had a good idea of how these things look in the real world, it used this data to generate interactive worlds of its own, based on its “learned” assumptions about things in reality. So far, the Nvidia team has used this approach to create a “simple driving game” in one of these AI-created environments, which is currently on display at the NeurIPS conference in Montreal, Canada.

The model used to generate this driving game was first described by Nvidia researchers in a paper posted to the arXiv preprint server in August. In that paper, the researchers described how their neural net could be used to predict the future of a video. This is important because it can be used by robots to predict how humans are going to act in a certain situation. This would be incredibly valuable for things like self-driving cars or factory robots, but it is a notoriously hard problem to tackle in machine learning. Nvidia’s progress in this respect is remarkable, given that the first predictive videos made by AI were only made about two years ago.

To get an idea of Nvidia’s progress, consider the image above, where the upper left quadrant depicts a still from an actual video of the world. The image on the upper right and lower left show two different neural networks trying to predict the future of a video, and the bottom left is Nvidia’s AI. It’s almost perfect, except for some small glitches, such as the street signs and road paint.

Nvidia’s neural net isn’t solely used to generate videos of cars navigating cities, or interactive driving games, however. The researchers also trained the neural network on videos of people’s faces and then used simple sketches to transfer expressions onto the people speaking in the video.

Researchers were able to use simple sketches to transfer facial expressions onto videos of people speaking. Image: Nvidia/arXiv

Sophisticated machine learning techniques to generate fake videos are rapidly becoming accessible to people who don’t have access to massive corporate research budgets. This is perhaps best exemplified by the rise of so-called deepfakes, AI fakery that was originally used to swap celebrities’ faces onto the bodies of adult film actresses.

Deepfakes raises a troubling question about how we will be able to trust video evidence when it’s relatively easy for anyone to create realistic fake video. Nvidia’s new research raises these stakes even higher since its AI can generate fake video based on simple sketches, and deepfakes require large libraries of images of the same person.

Fortunately, there’s not much reason to worry about Nvidia’s new AI being used for nefarious purposes just yet. This mainly has to do with the limitations on consumer hardware. Nvidia’s new neural network requires a lot of computing power to generate these videos, and most people don’t have access to that many GPUs.

For now, Nvidia’s new neural network will probably just be a way for companies to create realistic virtual worlds without as much expense or human effort.

Via Motherboard