Leveraging Online Videos for Advanced Robot Training

To be effectively utilized in real-world scenarios, robots must reliably perform a variety of everyday tasks, from household chores to industrial processes. Tasks such as manipulating fabrics, folding clothes, or assisting individuals with mobility impairments in knotting ties, are particularly challenging for robotic systems. Training robots to handle these tasks often involves imitation learning, which uses videos, motion capture footage, and other data of humans completing the tasks. However, this method requires substantial amounts of human demonstration data, which can be costly and difficult to obtain. Existing open-source datasets also tend to lack sufficient data compared to those used for training other computational techniques like computer vision or generative AI models.

Researchers at the National University of Singapore, Shanghai Jiao Tong University, and Nanjing University have recently proposed an alternative approach to enhance and simplify the training of robotics algorithms using human demonstrations. This approach, detailed in a paper pre-published on arXiv, utilizes the vast number of videos posted online daily as sources of human demonstrations for various tasks.

“This work begins with a simple idea, that of building a system that allows robots to utilize the countless human demonstration videos online to learn complex manipulation skills,” explained Weikun Peng, co-author of the paper, in an interview with Tech Xplore. “In other words, given an arbitrary human demonstration video, we wanted the robot to complete the same task shown in the video.”

Previous studies on imitation learning techniques often relied on domain-specific videos—footage of humans completing specific tasks in environments identical to those where the robot would later operate. In contrast, the framework developed by Peng and his colleagues is designed to enable robot imitation learning from arbitrary demonstration videos found online, regardless of the environment.

The team’s approach comprises three main components: Real2Sim, Learn@Sim, and Sim2Real. The central component, Real2Sim, is crucial to the framework’s operation.

“Real2Sim tracks the object’s motion in the demonstration video and replicates the same motion on a mesh model in a simulation,” Peng explained. “In other words, we try to replicate the human demonstration in the simulation. Finally, we get a sequence of object meshes, representing the ground truth object trajectory.”

This approach uses meshes—accurate digital representations of an object’s geometry, shape, and dynamics—as intermediate representations. After the Real2Sim component replicates a human demonstration in a simulated environment, the Learn@Sim component identifies the grasping and placing points necessary for a robot to perform the same actions using reinforcement learning. The final component, Sim2Real, involves deploying the learned policy to a real dual-arm robot, with a residual policy to bridge the Sim2Real gap.

The researchers tested their approach by focusing on the task of knotting a tie, a task notoriously difficult for robots. Their method enabled a robotic manipulator to successfully complete this task.

“Notably, many previous works require ‘in domain’ demonstration videos, which means the setting of demonstration videos should be the same as the setting of the robot execution environment,” Peng said. “Our method, on the other hand, can learn from ‘out of domain’ demonstration videos since we extract the object’s motion in 3D space from the demonstration video.”

In the future, Peng and his colleagues’ new approach could be applied to other complex and challenging robot manipulation tasks. This method has the potential to revolutionize robot training via imitation learning, paving the way for significant advancements in robotic capabilities.

“My plan for future work would be to expand the Real-Sim-Real idea to other tasks,” Peng added. “If we can replicate an object’s motion in simulation, could we replicate the real world in simulation? The robotics community is facing a data scarcity problem, and in my opinion, if we can replicate the real world in simulation, we can collect data more efficiently and better transfer learned policy to real robots.”

By Impact Lab