Robots are steadily advancing from simple tasks like cleaning spills to more complex household duties. Many of these robotic helpers are trained through imitation, mimicking the movements guided by humans. However, without the ability to adapt to unexpected obstacles or disruptions, robots often struggle to navigate unforeseen challenges, requiring them to restart tasks from the beginning.

Addressing this limitation, MIT engineers have devised a groundbreaking method to imbue robots with a degree of common sense when confronted with deviations from their trained paths. Their approach integrates robot motion data with the expansive “common sense knowledge” stored in large language models (LLMs).

This novel methodology empowers robots to logically break down household tasks into manageable subtasks and dynamically adjust to disruptions within these subtasks. Consequently, robots can seamlessly progress through tasks without the need for explicit programming to rectify every potential failure scenario.

Yanwei Wang, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS), explains the significance of their approach: “Imitation learning is a primary method for training household robots. However, blindly mimicking human motion trajectories can lead to cumulative errors, derailing task execution. With our method, robots can autonomously correct execution errors, enhancing overall task success.”

Presenting their innovative approach at the International Conference on Learning Representations (ICLR) in May, Wang and his colleagues detail their method’s application in a variety of household tasks. They exemplify their approach with a simple chore: scooping marbles from one bowl and pouring them into another.

Traditionally, engineers guide robots through the entire task in a single fluid trajectory, relying on human demonstrations for imitation. However, the team recognized that tasks comprise a sequence of subtasks, each crucial for successful completion. Consequently, they developed an algorithm to automatically connect an LLM’s natural language description of subtasks with the robot’s physical coordinates or image representation, a process known as “grounding.”

Through experiments with a robotic arm trained on the marble-scooping task, the team demonstrated the efficacy of their approach. After initial training demonstrations, the robot relied on a pretrained LLM to list the task’s steps, and their algorithm seamlessly linked these subtasks with the robot’s motion trajectory data.

During testing, the robot encountered disturbances such as nudges and marble spills. Remarkably, it autonomously corrected its course, completing each subtask before advancing to the next, without the need for human intervention or additional demonstrations.

Wang emphasizes the transformative potential of their method: “Our algorithm enables robust robot behavior capable of complex tasks, despite external perturbations. This is particularly exciting as it streamlines the training process for household robots, converting teleoperation data into adaptive behaviors.”

By Impact Lab