AITopics | Rao, Dushyant

Collaborating Authors

Rao, Dushyant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Liang, Jacky, Xia, Fei, Yu, Wenhao, Zeng, Andy, Arenas, Montserrat Gonzalez, Attarian, Maria, Bauza, Maria, Bennice, Matthew, Bewley, Alex, Dostmohamed, Adil, Fu, Chuyuan Kelly, Gileadi, Nimrod, Giustina, Marissa, Gopalakrishnan, Keerthana, Hasenclever, Leonard, Humplik, Jan, Hsu, Jasmine, Joshi, Nikhil, Jyenis, Ben, Kew, Chase, Kirmani, Sean, Lee, Tsang-Wei Edward, Lee, Kuang-Huei, Michaely, Assaf Hurwitz, Moore, Joss, Oslund, Ken, Rao, Dushyant, Ren, Allen, Tabanpour, Baruch, Vuong, Quan, Wahid, Ayzaan, Xiao, Ted, Xu, Ying, Zhuang, Vincent, Xu, Peng, Frey, Erik, Caluwaerts, Ken, Zhang, Tingnan, Ichter, Brian, Tompson, Jonathan, Takayama, Leila, Vanhoucke, Vincent, Shafran, Izhak, Mataric, Maja, Sadigh, Dorsa, Heess, Nicolas, Rao, Kanishka, Stewart, Nik, Tan, Jie, Parada, Carolina

arXiv.org Artificial IntelligenceMay-31-2024

Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.1145

Country: North America (0.14)

Genre: Research Report > Experimental Study (0.67)

Industry:

Education (1.00)
Energy > Oil & Gas > Upstream (0.80)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Bousmalis, Konstantinos, Vezzani, Giulia, Rao, Dushyant, Devin, Coline, Lee, Alex X., Bauza, Maria, Davchev, Todor, Zhou, Yuxiang, Gupta, Agrim, Raju, Akhil, Laurens, Antoine, Fantacci, Claudio, Dalibard, Valentin, Zambelli, Martina, Martins, Murilo, Pevceviciute, Rugile, Blokzijl, Michiel, Denil, Misha, Batchelor, Nathan, Lampe, Thomas, Parisotto, Emilio, Żołna, Konrad, Reed, Scott, Colmenarejo, Sergio Gómez, Scholz, Jon, Abdolmaleki, Abbas, Groth, Oliver, Regli, Jean-Baptiste, Sushkov, Oleg, Rothörl, Tom, Chen, José Enrique, Aytar, Yusuf, Barker, Dave, Ortiz, Joy, Riedmiller, Martin, Springenberg, Jost Tobias, Hadsell, Raia, Nori, Francesco, Heess, Nicolas

arXiv.org Artificial IntelligenceDec-22-2023

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

large language model, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2306.11706

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Towards Compute-Optimal Transfer Learning

Caccia, Massimo, Galashov, Alexandre, Douillard, Arthur, Rannen-Triki, Amal, Rao, Dushyant, Paganini, Michela, Charlin, Laurent, Ranzato, Marc'Aurelio, Pascanu, Razvan

arXiv.org Artificial IntelligenceApr-25-2023

The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet effective way to trade computational efficiency for asymptotic performance which we define as the performance a learning algorithm achieves as compute tends to infinity. Specifically, we argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. We evaluate our method on the Nevis'22 continual learning benchmark that offers a diverse set of transfer scenarios. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.

artificial intelligence, machine learning, pruning, (16 more...)

arXiv.org Artificial Intelligence

2304.13164

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Leveraging Jumpy Models for Planning and Fast Learning in Robotic Domains

Zhang, Jingwei, Springenberg, Jost Tobias, Byravan, Arunkumar, Hasenclever, Leonard, Abdolmaleki, Abbas, Rao, Dushyant, Heess, Nicolas, Riedmiller, Martin

arXiv.org Artificial IntelligenceFeb-24-2023

From daily interactions with the world, humans gradually develop an internal understanding of which series of events would be triggered when a certain sequence of actions is taken (Hogendoorn and Burkitt, 2018; Maus et al., 2013; Nortmann et al., 2015). This mental model of the world can serve as a compact proxy of our previous experiences and help us plan out routes to desired goals before taking action (Ha and Schmidhuber, 2018). Studies have further implied that these mental predictive models might not be restricted to the level of primitive actions (Botvinick, 2008; Consul et al., 2022), but rather consider predictions over larger timescales that abstract away detailed behavior consequences, which can enable efficient long-horizon planning to guide our daily decision making. When developing intelligent artificial agents it is therefore natural to imagine a similar process being useful for learning and transferring abstract models of the world across streams of experiences and tasks. We expect such a temporally abstract model of actions and dynamics to be significantly more useful than a simple one-step prediction model (together with primitive policies) when transferring them to a target task. This is because they should allow us to rapidly plan over long trajectories (to find some states with high rewards) while alleviating the common problem of error accumulation that occurs when chaining one-step prediction models which limits the effective planning horizon in most existing methods, e.g.

artificial intelligence, machine learning, planning and fast learning, (12 more...)

arXiv.org Artificial Intelligence

2302.12617

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

Vezzani, Giulia, Tirumala, Dhruva, Wulfmeier, Markus, Rao, Dushyant, Abdolmaleki, Abbas, Moran, Ben, Haarnoja, Tuomas, Humplik, Jan, Hafner, Roland, Neunert, Michael, Fantacci, Claudio, Hertweck, Tim, Lampe, Thomas, Sadeghi, Fereshteh, Heess, Nicolas, Riedmiller, Martin

arXiv.org Artificial IntelligenceJan-11-2023

The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

arXiv.org Artificial Intelligence

2211.13743

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies

Rao, Dushyant, Sadeghi, Fereshteh, Hasenclever, Leonard, Wulfmeier, Markus, Zambelli, Martina, Vezzani, Giulia, Tirumala, Dhruva, Aytar, Yusuf, Merel, Josh, Heess, Nicolas, Hadsell, Raia

arXiv.org Artificial IntelligenceDec-9-2021

For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model. In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent variables, to capture a set of high-level behaviours while allowing for variance in how they are executed. We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model. The resulting skills can be transferred and fine-tuned on new tasks, unseen objects, and from state to vision-based policies, yielding better sample efficiency and asymptotic performance compared to existing skill- and imitation-based methods. We further analyse how and when the skills are most beneficial: they encourage directed exploration to cover large regions of the state space relevant to the task, making them most effective in challenging sparse-reward settings.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2112.05062

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(2 more...)

Add feedback

Task-agnostic Continual Learning with Hybrid Probabilistic Models

Kirichenko, Polina, Farajtabar, Mehrdad, Rao, Dushyant, Lakshminarayanan, Balaji, Levine, Nir, Li, Ang, Hu, Huiyi, Wilson, Andrew Gordon, Pascanu, Razvan

arXiv.org Machine LearningJun-24-2021

Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the model's statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.

continual learning, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2106.12772

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data-efficient Hindsight Off-policy Option Learning

Wulfmeier, Markus, Rao, Dushyant, Hafner, Roland, Lampe, Thomas, Abdolmaleki, Abbas, Hertweck, Tim, Neunert, Michael, Tirumala, Dhruva, Siegel, Noah, Heess, Nicolas, Riedmiller, Martin

arXiv.org Artificial IntelligenceJul-30-2020

Solutions to most complex tasks can be decomposed into simpler, intermediate skills, reusable across wider ranges of problems. We follow this concept and introduce Hindsight Off-policy Options (HO2), a new algorithm for efficient and robust option learning. The algorithm relies on critic-weighted maximum likelihood estimation and an efficient dynamic programming inference procedure over off-policy trajectories. We can backpropagate through the inference procedure through time and the policy components for every time-step, making it possible to train all component's parameters off-policy, independently of the data-generating behavior policy. Experimentally, we demonstrate that HO2 outperforms competitive baselines and solves demanding robot stacking and ball-in-cup tasks from raw pixel inputs in simulation. We further compare autoregressive option policies with simple mixture policies, providing insights into the relative impact of two types of abstractions common in the options framework: action abstraction and temporal abstraction. Finally, we illustrate challenges caused by stale data in off-policy options learning and provide effective solutions.

experiment, neural network, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2007.15588

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
(3 more...)

Add feedback

Meta-Learning with Latent Embedding Optimization

Rusu, Andrei A., Rao, Dushyant, Sygnowski, Jakub, Vinyals, Oriol, Pascanu, Razvan, Osindero, Simon, Hadsell, Raia

arXiv.org Machine LearningSep-28-2018

Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space. The resulting approach, latent embedding optimization (LEO), decouples the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.

adaptation, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1807.0596

Country: Europe > Sweden (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deep Tracking on the Move: Learning to Track the World from a Moving Vehicle using Recurrent Neural Networks

Dequaire, Julie, Rao, Dushyant, Ondruska, Peter, Wang, Dominic, Posner, Ingmar

arXiv.org Artificial IntelligenceApr-19-2017

This paper presents an end-to-end approach for tracking static and dynamic objects for an autonomous vehicle driving through crowded urban environments. Unlike traditional approaches to tracking, this method is learned end-to-end, and is able to directly predict a full unoccluded occupancy grid map from raw laser input data. Inspired by the recently presented DeepTracking approach [Ondruska, 2016], we employ a recurrent neural network (RNN) to capture the temporal evolution of the state of the environment, and propose to use Spatial Transformer modules to exploit estimates of the egomotion of the vehicle. Our results demonstrate the ability to track a range of objects, including cars, buses, pedestrians, and cyclists through occlusion, from both moving and stationary platforms, using a single learned model. Experimental results demonstrate that the model can also predict the future states of objects from current inputs, with greater accuracy than previous work.

deep learning, neural network, vehicle, (19 more...)

arXiv.org Artificial Intelligence

1609.09365

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Arizona (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback