action influence
CUPID: Curating Data your Robot Loves with Influence Functions
Agia, Christopher, Sinha, Rohan, Yang, Jingyun, Antonova, Rika, Pavone, Marco, Nishimura, Haruki, Itkina, Masha, Bohg, Jeannette
In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-theoretic formulation for imitation learning policies. Given a set of evaluation rollouts, CUPID estimates the influence of each training demonstration on the policy's expected return. This enables ranking and selection of demonstrations according to their impact on the policy's closed-loop performance. We use CUPID to curate data by 1) filtering out training demonstrations that harm policy performance and 2) subselecting newly collected trajectories that will most improve the policy. Extensive simulated and hardware experiments show that our approach consistently identifies which data drives test-time performance. For example, training with less than 33% of curated data can yield state-of-the-art diffusion policies on the simulated RoboMimic benchmark, with similar gains observed in hardware. Furthermore, hardware experiments show that our method can identify robust strategies under distribution shift, isolate spurious correlations, and even enhance the post-training of generalist robot policies. Videos and code are made available at: https://cupid-curation.github.io.
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (4 more...)
A Survey of Temporal Credit Assignment in Deep Reinforcement Learning
Pignatelli, Eduardo, Ferret, Johan, Geist, Matthieu, Mesnard, Thomas, van Hasselt, Hado, Toni, Laura
The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the real world since most decision problems provide feedback that is noisy, delayed, and with little or no information about the causes. These conditions make it hard to distinguish serendipitous outcomes from those caused by informed decision-making. However, the mathematical nature of credit and the CAP remains poorly understood and defined. In this survey, we review the state of the art of Temporal Credit Assignment (CA) in deep RL. We propose a unifying formalism for credit that enables equitable comparisons of state of the art algorithms and improves our understanding of the trade-offs between the various methods. We cast the CAP as the problem of learning the influence of an action over an outcome from a finite amount of experience. We discuss the challenges posed by delayed effects, transpositions, and a lack of action influence, and analyse how existing methods aim to address them. Finally, we survey the protocols to evaluate a credit assignment method, and suggest ways to diagnoses the sources of struggle for different credit assignment methods. Overall, this survey provides an overview of the field for new-entry practitioners and researchers, it offers a coherent perspective for scholars looking to expedite the starting stages of a new study on the CAP, and it suggests potential directions for future research
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Education (1.00)
- Information Technology (0.92)
- Government > Military (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Multiagent Metareasoning through Organizational Design
Sleight, Jason (University of Michigan) | Durfee, Edmund H. (University of Michigan)
We formulate an approach to multiagent metareasoning that uses organizational design to focus each agent's reasoning on the aspects of its local problem that let it make the most worthwhile contributions to joint behavior. By employing the decentralized Markov decision process framework, we characterize an organizational design problem that explicitly considers the quantitative impact that a design has on both the quality of the agents' behaviors and their reasoning costs. We describe an automated organizational design process that can approximately solve our organizational design problem via incremental search, and present techniques that efficiently estimate the incremental impact of a candidate organizational influence. Our empirical evaluation confirms that our process generates organizational designs that impart a desired metareasoning regime upon the agents.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Massachusetts (0.04)