AITopics | Stone, Peter

Collaborating Authors

Stone, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Chen, Yuxin, Tang, Chen, Li, Chenran, Tian, Ran, Stone, Peter, Tomizuka, Masayoshi, Zhan, Wei

arXiv.org Artificial IntelligenceJun-23-2024

Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. Instead of inferring the complete human behavior characteristics, MEReQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function. Extensive evaluations on simulated and real-world tasks demonstrate that MEReQ achieves sample-efficient policy alignment from human intervention.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2406.16258

Country:

North America > United States > California (0.14)
North America > United States > Texas (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

Vasco, Miguel, Seno, Takuma, Kawamoto, Kenta, Subramanian, Kaushik, Wurman, Peter R., Stone, Peter

arXiv.org Artificial IntelligenceJun-18-2024

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Turismo. However, this agent relied on global features that require instrumentation external to the car. This paper introduces, to the best of our knowledge, the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity. By leveraging global features only at training time, the learned agent is able to outperform the best human drivers in time trial (one car on the track at a time) races using only local input features. The resulting agent is evaluated in Gran Turismo 7 on multiple tracks and cars. Detailed ablation experiments demonstrate the agent's strong reliance on visual inputs, making it the first vision-based super-human car racing agent.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2406.12563

Country:

Asia (0.29)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (1.00)
Leisure & Entertainment > Sports > Motorsports (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Zhu, Yifeng, Lim, Arisrei, Stone, Peter, Zhu, Yuke

arXiv.org Artificial IntelligenceMay-30-2024

We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos. We investigate the problem of imitating robot manipulation from a single human video in the open-world setting, where a robot must learn to manipulate novel objects from one video demonstration. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video and deriving a policy that conditions on the extracted plan. Our method enables the robot to learn from videos captured by daily mobile devices such as an iPad and generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances. We systematically evaluate our method on both short-horizon and long-horizon tasks, demonstrating the efficacy of ORION in learning from a single human video in the open world. Videos can be found in the project website https://ut-austin-rpl.github.io/ORION-release.

artificial intelligence, arxiv preprint arxiv, video, (15 more...)

arXiv.org Artificial Intelligence

2405.20321

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Towards Imitation Learning in Real World Unstructured Social Mini-Games in Pedestrian Crowds

Chandra, Rohan, Karnan, Haresh, Mehr, Negar, Stone, Peter, Biswas, Joydeep

arXiv.org Artificial IntelligenceMay-26-2024

Imitation Learning (IL) strategies are used to generate policies for robot motion planning and navigation by learning from human trajectories. Recently, there has been a lot of excitement in applying IL in social interactions arising in urban environments such as university campuses, restaurants, grocery stores, and hospitals. However, obtaining numerous expert demonstrations in social settings might be expensive, risky, or even impossible. Current approaches therefore, focus only on simulated social interaction scenarios. This raises the question: \textit{How can a robot learn to imitate an expert demonstrator from real world multi-agent social interaction scenarios}? It remains unknown which, if any, IL methods perform well and what assumptions they require. We benchmark representative IL methods in real world social interaction scenarios on a motion planning task, using a novel pedestrian intersection dataset collected at the University of Texas at Austin campus. Our evaluation reveals two key findings: first, learning multi-agent cost functions is required for learning the diverse behavior modes of agents in tightly coupled interactions and second, conditioning the training of IL methods on partial state information or providing global information in simulation can improve imitation learning, especially in real world social interaction scenarios.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2405.16439

Country: North America > United States > Texas > Travis County > Austin (0.25)

Genre: Research Report (1.00)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

Chuck, Caleb, Qi, Carl, Munje, Michael J., Li, Shuozhe, Rudolph, Max, Shi, Chang, Agarwal, Siddhant, Sikchi, Harshit, Peri, Abhinav, Dayal, Sarthak, Kuo, Evan, Mehta, Kavan, Wang, Anthony, Stone, Peter, Zhang, Amy, Niekum, Scott

arXiv.org Artificial IntelligenceMay-5-2024

Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2405.03113

Country:

North America > United States > Texas (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Hockey (1.00)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning in Parameterized Action Space

Hausknecht, Matthew, Stone, Peter

arXiv.org Artificial IntelligenceMay-3-2024

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning within the domain of simulated RoboCup soccer, which features a small set of discrete action types, each of which is parameterized with continuous variables. The best learned agent can score goals more reliably than the 2012 RoboCup champion agent. As such, this paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1511.04143

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Multi-Agent Synchronization Tasks

Fernandez, Rolando, Warnell, Garrett, Asher, Derrik E., Stone, Peter

arXiv.org Artificial IntelligenceApr-29-2024

In multi-agent reinforcement learning (MARL), coordination plays a crucial role in enhancing agents' performance beyond what they could achieve through cooperation alone. The interdependence of agents' actions, coupled with the need for communication, leads to a domain where effective coordination is crucial. In this paper, we introduce and define $\textit{Multi-Agent Synchronization Tasks}$ (MSTs), a novel subset of multi-agent tasks. We describe one MST, that we call $\textit{Synchronized Predator-Prey}$, offering a detailed description that will serve as the basis for evaluating a selection of recent state-of-the-art (SOTA) MARL algorithms explicitly designed to address coordination challenges through the use of communication strategies. Furthermore, we present empirical evidence that reveals the limitations of the algorithms assessed to solve MSTs, demonstrating their inability to scale effectively beyond 2-agent coordination tasks in scenarios where communication is a requisite component. Finally, the results raise questions about the applicability of recent SOTA approaches for complex coordination tasks (i.e. MSTs) and prompt further exploration into the underlying causes of their limitations in this context.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.18798

Country: North America > United States > Texas > Travis County > Austin (0.15)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

N-Agent Ad Hoc Teamwork

Wang, Caroline, Rahman, Arrasy, Durugkar, Ishan, Liebman, Elad, Stone, Peter

arXiv.org Artificial IntelligenceApr-16-2024

Current approaches to learning cooperative behaviors in multi-agent settings assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls \textit{all} agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a $\textit{single}$ agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards generalizing the class of scenarios that cooperative learning methods can address, we introduce $N$-agent ad hoc teamwork, in which a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates at evaluation time. This paper formalizes the problem, and proposes the $\textit{Policy Optimization with Agent Modelling}$ (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on StarCraft II tasks shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2404.1074

Country: North America > United States > Texas (0.14)

Genre:

Research Report (0.82)
Overview (0.67)

Industry:

Information Technology (0.48)
Leisure & Entertainment > Games (0.34)
Transportation > Ground (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Dyna-LfLH: Learning Agile Navigation in Dynamic Environments from Learned Hallucination

Ghani, Saad Abdul, Wang, Zizhao, Stone, Peter, Xiao, Xuesu

arXiv.org Artificial IntelligenceMar-25-2024

This paper presents a self-supervised learning method to safely learn a motion planner for ground robots to navigate environments with dense and dynamic obstacles. When facing highly-cluttered, fast-moving, hard-to-predict obstacles, classical motion planners may not be able to keep up with limited onboard computation. For learning-based planners, high-quality demonstrations are difficult to acquire for imitation learning while reinforcement learning becomes inefficient due to the high probability of collision during exploration. To safely and efficiently provide training data, the Learning from Hallucination (LfH) approaches synthesize difficult navigation environments based on past successful navigation experiences in relatively easy or completely open ones, but unfortunately cannot address dynamic obstacles. In our new Dynamic Learning from Learned Hallucination (Dyna-LfLH), we design and learn a novel latent distribution and sample dynamic obstacles from it, so the generated training data can be used to learn a motion planner to navigate in dynamic environments. Dyna-LfLH is evaluated on a ground robot in both simulated and physical environments and achieves up to 25% better success rate compared to baselines.

artificial intelligence, learned hallucination, machine learning, (5 more...)

arXiv.org Artificial Intelligence

2403.17231

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Add feedback

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation

Dass, Shivin, Ai, Wensi, Jiang, Yuqian, Singh, Samik, Hu, Jiaheng, Zhang, Ruohan, Stone, Peter, Abbatematteo, Ben, Martín-Martín, Roberto

arXiv.org Artificial IntelligenceMar-21-2024

A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.

artificial intelligence, machine learning, robot, (16 more...)

arXiv.org Artificial Intelligence

2403.07869

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback