AITopics | Vakil, Jay

Collaborating Authors

Vakil, Jay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Yenamandra, Sriram, Ramachandran, Arun, Khanna, Mukul, Yadav, Karmesh, Vakil, Jay, Melnik, Andrew, Büttner, Michael, Harz, Leon, Brown, Lyon, Nandi, Gora Chand, PS, Arjun, Yadav, Gaurav Kumar, Kala, Rahul, Haschke, Robert, Luo, Yang, Zhu, Jinxin, Han, Yansen, Lu, Bingyi, Gu, Xuan, Liu, Qinyuan, Zhao, Yaping, Ye, Qiting, Dou, Chenxiao, Chua, Yansong, Kuzma, Volodymyr, Humennyy, Vladyslav, Partsey, Ruslan, Francis, Jonathan, Chaplot, Devendra Singh, Chhablani, Gunjan, Clegg, Alexander, Gervet, Theophile, Jain, Vidhi, Ramrakhya, Ram, Szot, Andrew, Wang, Austin, Yang, Tsung-Yen, Edsinger, Aaron, Kemp, Charlie, Shah, Binit, Kira, Zsolt, Batra, Dhruv, Mottaghi, Roozbeh, Bisk, Yonatan, Paxton, Chris

arXiv.org Artificial IntelligenceJul-9-2024

In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.

artificial intelligence, machine learning, receptacle, (16 more...)

arXiv.org Artificial Intelligence

2407.06939

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Spatial-Language Attention Policies for Efficient Robot Learning

Parashar, Priyam, Jain, Vidhi, Zhang, Xiaohan, Vakil, Jay, Powers, Sam, Bisk, Yonatan, Paxton, Chris

arXiv.org Artificial IntelligenceNov-7-2023

Despite great strides in language-guided manipulation, existing work has been constrained to table-top settings. Table-tops allow for perfect and consistent camera angles, properties are that do not hold in mobile manipulation. Task plans that involve moving around the environment must be robust to egocentric views and changes in the plane and angle of grasp. A further challenge is ensuring this is all true while still being able to learn skills efficiently from limited data. We propose Spatial-Language Attention Policies (SLAP) as a solution. SLAP uses three-dimensional tokens as the input representation to train a single multi-task, language-conditioned action prediction policy. Our method shows an 80% success rate in the real world across eight tasks with a single model, and a 47.5% success rate when unseen clutter and unseen object configurations are introduced, even with only a handful of examples per task. This represents an improvement of 30% over prior work (20% given unseen distractors and configurations). We see a 4x improvement over baseline in mobile manipulation setting. In addition, we show how SLAPs robustness allows us to execute Task Plans from open-vocabulary instructions using a large language model for multi-step mobile manipulation. For videos, see the website: https://robotslap.github.io

interaction point, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2304.11235

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

RoboHive: A Unified Framework for Robot Learning

Kumar, Vikash, Shah, Rutav, Zhou, Gaoyue, Moens, Vincent, Caggiano, Vittorio, Vakil, Jay, Gupta, Abhishek, Rajeswaran, Aravind

arXiv.org Artificial IntelligenceOct-10-2023

Our platform encompasses a diverse range of pre-existing and novel environments, including dexterous manipulation with the Shadow Hand, whole-arm manipulation tasks with Franka and Fetch robots, quadruped locomotion, among others. Included environments are organized within and cover multiple domains such as hand manipulation, locomotion, multi-task, multi-agent, muscles, etc. In comparison to prior works, RoboHive offers a streamlined and unified task interface taking dependency on only a minimal set of well-maintained packages, features tasks with high physics fidelity and rich visual diversity, and supports common hardware drivers for real-world deployment. The unified interface of RoboHive offers a convenient and accessible abstraction for algorithmic research in imitation, reinforcement, multi-task, and hierarchical learning. Furthermore, RoboHive includes expert demonstrations and baseline results for most environments, providing a standard for benchmarking and comparisons.

artificial intelligence, machine learning, robohive, (14 more...)

arXiv.org Artificial Intelligence

2310.06828

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

Silwal, Sneha, Yadav, Karmesh, Wu, Tingfan, Vakil, Jay, Majumdar, Arjun, Arnaud, Sergio, Chen, Claire, Berges, Vincent-Pierre, Batra, Dhruv, Rajeswaran, Aravind, Kalakrishnan, Mrinal, Meier, Franziska, Maksymets, Oleksandr

arXiv.org Artificial IntelligenceOct-3-2023

We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study spans five different PVRs, two different policy-learning paradigms (imitation and reinforcement learning), and three different robots for 5 distinct manipulation and indoor navigation tasks. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance. See project website for additional details and visuals.

machine learning, reinforcement learning, simulation, (17 more...)

arXiv.org Artificial Intelligence

2310.02219

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

Bharadhwaj, Homanga, Vakil, Jay, Sharma, Mohit, Gupta, Abhinav, Tulsiani, Shubham, Kumar, Vikash

arXiv.org Artificial IntelligenceSep-4-2023

The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such an universal agent would require a structured framework capable of wide generalization but trained within a reasonable data budget. In this paper, we develop an efficient system (RoboAgent) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enable our agent to exhibit a diverse repertoire of skills in novel situations specified using language commands. Using merely 7500 demonstrations, we are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, RoboAgent outperforms prior methods by over 40% in unseen situations while being more sample efficient and being amenable to capability improvements and extensions through fine-tuning. Videos at https://robopen.github.io/

artificial intelligence, generalization and efficiency, semantic augmentation and action chunking, (2 more...)

arXiv.org Artificial Intelligence

2309.01918

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.40)

Add feedback