In a groundbreaking endeavor, the Toyota Research Institute (TRI) has harnessed the power of generative AI to educate robots on the art of preparing breakfast, breaking away from the traditional arduous coding and debugging process. The key to their success lies in imparting robots with a sense of touch and connecting them to an AI model, allowing them to learn much like humans do.
This tactile sense is identified as a pivotal element in the robots’ learning process. By equipping them with a soft, pliable “thumb,” as seen in the accompanying video, the AI model gains the ability to “feel” its actions, providing invaluable sensory input that surpasses the limitations of sight alone. Consequently, intricate tasks become more manageable for the robots.
Ben Burchfiel, the lab’s manager overseeing dexterous manipulation, expresses his excitement at witnessing these robots engage with their surroundings. The teaching process involves a human “instructor” initially demonstrating a set of skills, following which the AI model quietly assimilates knowledge in the background. Burchfiel notes that it is not uncommon for them to instruct a robot in the afternoon, allow it to learn autonomously overnight, and arrive the next morning to observe a freshly acquired behavior.
The researchers are actively striving to develop what they term “Large Behavior Models” (LBMs), which could be humorously thought of as “Large Breakfast Models.” Drawing a parallel with Language Models like LLMs, which learn by recognizing patterns in human text, Toyota’s LBMs would primarily rely on observation, enabling robots to “generalize” and execute entirely new tasks they have never been explicitly taught. Russ Tedrake, an MIT robotics professor and Vice President of robotics research at TRI, elucidates this concept.
Employing this methodology, the researchers have successfully trained robots in more than 60 intricate skills, including tasks such as pouring liquids, utilizing tools, and manipulating deformable objects. Their ambitious goal is to expand this repertoire to 1,000 skills by the close of 2024.
Notably, other tech giants such as Google and Tesla have embarked on similar research initiatives with their robotic creations, such as the Robotic Transformer, RT-2. These robots, much like Toyota’s, rely on experiential learning to deduce new actions. The ultimate vision is that AI-trained robots could potentially execute tasks with minimal instruction, akin to the guidance one might provide to a human (“clean up that spill,” for instance).
However, as The New York Times pointed out in its coverage of Google’s research, this type of work is typically “slow and labor-intensive.” Ensuring sufficient training data is a considerable challenge, far more complex than merely inundating an AI model with copious amounts of internet-derived data, as illustrated by an example involving a robot incorrectly identifying a banana’s color as white.
By Impact Lab