Goto

Collaborating Authors

 Iscen, Atil


Gemini Robotics: Bringing AI into the Physical World

arXiv.org Artificial Intelligence

Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Language-Action (VLA) generalist model capable of directly controlling robots. Gemini Robotics executes smooth and reactive movements to tackle a wide range of complex manipulation tasks while also being robust to variations in object types and positions, handling unseen environments as well as following diverse, open vocabulary instructions. We show that with additional fine-tuning, Gemini Robotics can be specialized to new capabilities including solving long-horizon, highly dexterous tasks, learning new short-horizon tasks from as few as 100 demonstrations and adapting to completely novel robot embodiments. This is made possible because Gemini Robotics builds on top of the Gemini Robotics-ER model, the second model we introduce in this work. Gemini Robotics-ER (Embodied Reasoning) extends Gemini's multimodal reasoning capabilities into the physical world, with enhanced spatial and temporal understanding. This enables capabilities relevant to robotics including object detection, pointing, trajectory and grasp prediction, as well as multi-view correspondence and 3D bounding box predictions. We show how this novel combination can support a variety of robotics applications. We also discuss and address important safety considerations related to this new class of robotics foundation models. The Gemini Robotics family marks a substantial step towards developing general-purpose robots that realizes AI's potential in the physical world.


Prosody for Intuitive Robotic Interface Design: It's Not What You Said, It's How You Said It

arXiv.org Artificial Intelligence

In this paper, we investigate the use of 'prosody' (the musical elements of speech) as a communicative signal for intuitive human-robot interaction interfaces. Our approach, rooted in Research through Design (RtD), examines the application of prosody in directing a quadruped robot navigation. We involved ten team members in an experiment to command a robot through an obstacle course using natural interaction. A human operator, serving as the robot's sensory and processing proxy, translated human communication into a basic set of navigation commands, effectively simulating an intuitive interface. During our analysis of interaction videos, when lexical and visual cues proved insufficient for accurate command interpretation, we turned to non-verbal auditory cues. Qualitative evidence suggests that participants intuitively relied on prosody to control robot navigation. We highlight specific distinct prosodic constructs that emerged from this preliminary exploration and discuss their pragmatic functions. This work contributes a discussion on the broader potential of prosody as a multifunctional communicative signal for designing future intuitive robotic interfaces, enabling lifelong learning and personalization in human-robot interaction.


Robotic Table Tennis: A Case Study into a High Speed Learning System

arXiv.org Artificial Intelligence

We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.


Barkour: Benchmarking Animal-level Agility with Quadruped Robots

arXiv.org Artificial Intelligence

Abstract--Animals have evolved various agile locomotion strategies, such as sprinting, leaping, and jumping. There is a growing interest in developing legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agility. We introduce the Barkour benchmark, an obstacle course to quantify agility for legged robots. Inspired by dog agility competitions, it consists of diverse obstacles and a time based scoring mechanism. This encourages researchers to develop controllers that not only move fast, but do so in a controllable and versatile way. To set strong baselines, we present two methods for tackling the benchmark. In the first approach, we train specialist locomotion skills using on-policy reinforcement learning methods and combine them with a highlevel navigation controller. In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived There has been a proliferation of legged robot development inspired by animal mobility. An important research question in this field is how to develop a controller that enables legged robots to exhibit animal-level agility while also being able to generalize environments, such as up and down stairs, through bushes, across various obstacles and terrains. Through the exploration and over unpaved roads and rocky or even sandy beaches. of both learning and traditional control-based methods, there Despite advances in robot hardware and control, a major has been significant progress in enabling robots to walk across challenge in the field is the lack of standardized and intuitive a wide range of terrains [10, 21, 20, 1, 27]. These robots are methods for evaluating the effectiveness of locomotion now capable of walking in a variety of indoor and outdoor controllers.


Disentangled Planning and Control in Vision Based Robotics via Reward Machines

arXiv.org Artificial Intelligence

In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.


Policies Modulating Trajectory Generators

arXiv.org Artificial Intelligence

Abstract: We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4 -dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity. Keywords: Reinforcement Learning, Control, Locomotion 1 Introduction The recent success of Deep Learning (DL) on simulated robotic tasks has opened an exciting research direction. Nevertheless, many robotic tasks such as locomotion still remain an open problem for learning-based methods due to their complexity or dynamics. From a Deep Learning (DL) perspective, one way to tackle these complex problems is by using more and more complex policies (such as recurrent networks). Unfortunately, more complex policies are harder to train and require even more training data which is often problematic for robotics.


Data Efficient Reinforcement Learning for Legged Robots

arXiv.org Artificial Intelligence

We present a model-based framework for robot locomotion that achieves walking based on only 4.5 minutes (45,000 control steps) of data collected on a quadruped robot. To accurately model the robot's dynamics over a long horizon, we introduce a loss function that tracks the model's prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function. To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.


Hierarchical Reinforcement Learning for Quadruped Locomotion

arXiv.org Artificial Intelligence

Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot's on-board sensors to control the robot's actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to a different task by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task.


When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies

arXiv.org Machine Learning

Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they match state of the art methods for policy optimization tasks. However, blackbox DFO methods suffer from high sampling complexity since they require a substantial number of policy rollouts for reliable updates. They can also be very sensitive to noise in the rewards, actuators or the dynamics of the environment. In this paper we propose to replace the standard ES derivative-free paradigm for RL based on simple reward-weighted averaged random perturbations for policy updates, that has recently become a subject of voluminous research, by an algorithm where gradients of blackbox RL functions are estimated via regularized regression methods. In particular, we propose to use L1/L2 regularized regression-based gradient estimation to exploit sparsity and smoothness, as well as LP decoding techniques for handling adversarial stochastic and deterministic noise. Our methods can be naturally aligned with sliding trust region techniques for efficient samples reuse to further reduce sampling complexity. This is not the case for standard ES methods requiring independent sampling in each epoch. We show that our algorithms can be applied in locomotion tasks, where training is conducted in the presence of substantial noise, e.g. for learning in sim transferable stable walking behaviors for quadruped robots or training quadrupeds how to follow a path. We further demonstrate our methods on several $\mathrm{OpenAI}$ $\mathrm{Gym}$ $\mathrm{Mujoco}$ RL tasks. We manage to train effective policies even if up to $25\%$ of all measurements are arbitrarily corrupted, where standard ES methods produce sub-optimal policies or do not manage to learn at all. Our empirical results are backed by theoretical guarantees.


NoRML: No-Reward Meta Learning

arXiv.org Artificial Intelligence

Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML's finetune step. Our method has a more expressive update step than MAML, while maintaining MAML's gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the meta-policy parameters from the fine-tuned policies' parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks.