Fig.1. Overview of the proposed method. An image of the current landscape is acquired by the mobile terminal and sent to the server PC. The server detects the target building and generates a mask. The area to be complemented is set from the mask image, and the input image is automatically altered based on the features around the target area. The output image based on the digital completion is sent to the mobile terminal as a future landscape after demolition to be displayed on the DR display. Credit: Takuya Kikuchi et al.
Scientists at Osaka University have created a machine learning system that is capable of virtually removing buildings from a live view. By using generative adversarial networks (GAN) algorithms running on a remote server, the team was able to stream in real-time on a mobile device. This work can help accelerate the process of urban renewal based on community agreement.
Some necessary urban renewal tasks, such as demolishing old buildings, are delayed due to the difficulty in convincing stakeholders to commit resources to a project. For instance, differences in understanding about the plan among building owners and nearby residents may lead to conflict and delays. This may result in a paradox in which tasks would be feasible to begin only after they are already accomplished. Without access to a time machine, this may seem to lead to intractable situations in civil planning.
Now, a team of researchers at Osaka University have help to address this concern in the form of a new algorithm based on machine learningthat provides augmented reality real-time video demonstrating the view after a building is removed. “Our method enables users to intuitively understand what the future landscape will look like, which can contribute to reducing the time and cost for forming a consensus,” first author Takuya Kikuchi says. Communication between a mobile device and a server means that all the processing can be done remotely, so any smart phone or tablet can be used at the location of the building. To speed up the algorithm so it can provide real-time augmented video, the team used semantic segmentation on the input image. This allows the deep learning model to classify images pixel by pixel, as opposed to conventional methods that try to perform 3D object detection.
GAN algorithms use two competing neural networks, a generator and a discriminator. The generator is trained to create increasingly realistic images, while the discriminator is tasked with distinguishing if the image was real or artificially generated. “By learning in this way, the GAN algorithm can produce images that do not actually exist but are plausible,” corresponding author Tomohiro Fukuda says. In this case, high accuracy processing was possible as long as the building to be removed from the landscape did not take up more than 15% of the screen. On the basis of field tests, the team was able to achieve virtual demolition video to be streamed at an average rate of 5.71 frames per second, which may greatly assist in on-site community enhancement.