Researchers at North Carolina State University have developed an innovative technique that allows artificial intelligence (AI) programs to more accurately map three-dimensional (3D) spaces using two-dimensional (2D) images captured by multiple cameras. This advancement promises to significantly improve the navigation capabilities of autonomous vehicles while operating efficiently with limited computational resources.

“Most autonomous vehicles use powerful AI programs called vision transformers to take 2D images from multiple cameras and create a representation of the 3D space around the vehicle,” explained Tianfu Wu, Ph.D., Associate Professor of Electrical and Computer Engineering at North Carolina State University. “However, while each of these AI programs takes a different approach, there is still substantial room for improvement.”

The team’s new technique, known as Multi-View Attentive Contextualization (MvACon), is designed to be a plug-and-play supplement for existing vision transformer AIs, enhancing their ability to map 3D spaces. “The vision transformers aren’t getting any additional data from their cameras,” said Wu. “They’re just able to make better use of the data.”

MvACon builds on a previously developed method called Patch-to-Cluster attention (PaCa), which allows transformer AIs to more efficiently and effectively identify objects in an image. “The key advance here is applying what we demonstrated with PaCa to the challenge of mapping 3D space using multiple cameras,” Wu noted.

To evaluate MvACon’s performance, the researchers integrated it with three leading vision transformers: BEVFormer, BEVFormer DFA3D variant, and PETR. In tests involving 2D images from six different cameras, MvACon significantly improved the performance of each vision transformer, particularly in terms of locating objects and determining their speed and orientation. Importantly, the addition of MvACon introduced minimal increase in computational demand.

“Our next steps include testing MvACon against additional benchmark datasets and actual video input from autonomous vehicles. If MvACon continues to outperform existing vision transformers, we’re optimistic that it will be adopted for widespread use,” Wu said.

By Impact Lab