Google’s DeepMind robotics team has unveiled three groundbreaking innovations aimed at enhancing the decision-making abilities of robots operating in outdoor settings. These advancements promise to enable robots to make quicker, more intelligent, and safer decisions when working alongside humans. Among the key developments is a unique system for gathering training data, accompanied by what Google refers to as a “Robot Constitution.”

AutoRT: Revolutionizing Data Collection

DeepMind’s AutoRT is a cutting-edge data collection system designed to empower robots with the capacity to comprehend their surroundings, adapt to new situations, and select optimal tasks. It harnesses the capabilities of both a Visual Language Model (VLM) and a Large Language Model (LLM), working in tandem.

The VLM’s role is to perceive the robot’s environment and identify visible objects, while the LLM serves as the decision-maker, suggesting suitable tasks for the robot to perform. This dual-model approach equips robots with the ability to make informed choices and navigate their surroundings with greater efficiency.

Robot Constitution: Safety Guidelines

Inspired by Isaac Asimov’s “Three Laws of Robotics,” DeepMind’s Robot Constitution serves as a set of safety guidelines for AI-powered robots. This constitution instructs the LLM to avoid tasks involving humans, animals, sharp objects, and electrical appliances, ensuring the safety of both robots and their human counterparts.

DeepMind has taken additional precautions to enhance robot safety. Robots will come to a halt automatically if the force on their joints exceeds a predefined limit. Moreover, a physical kill switch is available to human operators, providing an additional layer of security.

Real-World Deployment and Testing

In just seven months, Google deployed fifty-three AutoRT robots across four office buildings. During this period, over seventy-seven thousand trials were conducted to assess the robots’ capabilities. Some of these robots were remotely controlled by human operators, while others followed predefined scripts or operated autonomously using Google’s Robotic Transformer (RT-2) AI learning model.

These practical-looking robots feature a camera, a robot arm, and a mobile base. The Visual Language Model (VLM) aids in understanding their surroundings and object identification, while the Large Language Model (LLM) ensures effective and safe decision-making.

SARA-RT: A Game-Changer for Robotics

One of the most significant breakthroughs in this endeavor is the introduction of Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT). This system optimizes Robotics Transformer (RT) models, increasing their efficiency in real-world applications.

The RT neural network architecture, particularly the advanced RT-2 model, has witnessed remarkable improvements. The top-performing SARA-RT-2 models have demonstrated a remarkable 10.6 percent improvement in accuracy and a 14 percent increase in speed compared to their RT-2 counterparts, all while using a concise history of images as input.

The introduction of SARA-RT marks a significant milestone in robotics, representing the first scalable attention mechanism that substantially enhances computational efficiency without compromising decision-making quality. This development is poised to revolutionize how robots operate in real-world scenarios, making them more capable, efficient, and safe.

Google’s DeepMind robotics team’s recent innovations in AI-driven robotics, including the introduction of a “Robot Constitution” and the groundbreaking SARA-RT system, signify a significant leap forward in the realm of robotics. These advancements are set to enhance robots’ decision-making capabilities in outdoor environments, ushering in a future where robots and humans can collaborate seamlessly and securely.

By Impact Lab