Goto

Collaborating Authors

 Mehr, Negar


Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse Reward Scenarios

arXiv.org Artificial Intelligence

Learning from Demonstration (LfD) can be an efficient way to train systems with analogous agents by enabling ``Student'' agents to learn from the demonstrations of the most experienced ``Teacher'' agent, instead of training their policy in parallel. However, when there are discrepancies in agent capabilities, such as divergent actuator power or joint angle constraints, naively replicating demonstrations that are out of bounds for the Student's capability can limit efficient learning. We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents. Our framework is based on the concept of ``surprise'', inspired by its application in exploration incentivization in sparse-reward environments. Surprise is repurposed to enable the Teacher to detect and adapt to differences between itself and the Student. By focusing on maximizing its surprise in response to the environment while concurrently minimizing the Student's surprise in response to the demonstrations, the Teacher agent can effectively tailor its demonstrations to the Student's specific capabilities and constraints. We validate our method by demonstrating improvements in the Student's learning in control tasks within sparse-reward environments.


Integrating Predictive Motion Uncertainties with Distributionally Robust Risk-Aware Control for Safe Robot Navigation in Crowds

arXiv.org Artificial Intelligence

Ensuring safe navigation in human-populated environments is crucial for autonomous mobile robots. Although recent advances in machine learning offer promising methods to predict human trajectories in crowded areas, it remains unclear how one can safely incorporate these learned models into a control loop due to the uncertain nature of human motion, which can make predictions of these models imprecise. In this work, we address this challenge and introduce a distributionally robust chance-constrained model predictive control (DRCC-MPC) which: (i) adopts a probability of collision as a pre-specified, interpretable risk metric, and (ii) offers robustness against discrepancies between actual human trajectories and their predictions. We consider the risk of collision in the form of a chance constraint, providing an interpretable measure of robot safety. To enable real-time evaluation of chance constraints, we consider conservative approximations of chance constraints in the form of distributionally robust Conditional Value at Risk constraints. The resulting formulation offers computational efficiency as well as robustness with respect to out-of-distribution human motion. With the parallelization of a sampling-based optimization technique, our method operates in real-time, demonstrating successful and safe navigation in a number of case studies with real-world pedestrian data.


Weathering Ongoing Uncertainty: Learning and Planning in a Time-Varying Partially Observable Environment

arXiv.org Artificial Intelligence

Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.


Optimal Robotic Assembly Sequence Planning: A Sequential Decision-Making Approach

arXiv.org Artificial Intelligence

The optimal robot assembly planning problem is challenging due to the necessity of finding the optimal solution amongst an exponentially vast number of possible plans, all while satisfying a selection of constraints. Traditionally, robotic assembly planning problems have been solved using heuristics, but these methods are specific to a given objective structure or set of problem parameters. In this paper, we propose a novel approach to robotic assembly planning that poses assembly sequencing as a sequential decision making problem, enabling us to harness methods that far outperform the state-of-the-art. We formulate the problem as a Markov Decision Process (MDP) and utilize Dynamic Programming (DP) to find optimal assembly policies for moderately sized strictures. We further expand our framework to exploit the deterministic nature of assembly planning and introduce a class of optimal Graph Exploration Assembly Planners (GEAPs). For larger structures, we show how Reinforcement Learning (RL) enables us to learn policies that generate high reward assembly sequences. We evaluate our approach on a variety of robotic assembly problems, such as the assembly of the Hubble Space Telescope, the International Space Station, and the James Webb Space Telescope. We further showcase how our DP, GEAP, and RL implementations are capable of finding optimal solutions under a variety of different objective functions and how our formulation allows us to translate precedence constraints to branch pruning and thus further improve performance. We have published our code at https://github.com/labicon/ORASP-Code.


Strategic Decision-Making in Multi-Agent Domains: A Weighted Potential Dynamic Game Approach

arXiv.org Artificial Intelligence

In interactive multi-agent settings, decision-making complexity arises from agents' interconnected objectives. Dynamic game theory offers a formal framework for analyzing such intricacies. Yet, solving dynamic games and determining Nash equilibria pose computational challenges due to the need of solving coupled optimal control problems. To address this, our key idea is to leverage potential games, which are games with a potential function that allows for the computation of Nash equilibria by optimizing the potential function. We argue that dynamic potential games, can effectively facilitate interactive decision-making in many multi-agent interactions. We will identify structures in realistic multi-agent interactive scenarios that can be transformed into weighted potential dynamic games. We will show that the open-loop Nash equilibria of the resulting weighted potential dynamic game can be obtained by solving a single optimal control problem. We will demonstrate the effectiveness of the proposed method through various simulation studies, showing close proximity to feedback Nash equilibria and significant improvements in solve time compared to state-of-the-art game solvers.


Active Inverse Learning in Stackelberg Trajectory Games

arXiv.org Artificial Intelligence

Game-theoretic inverse learning is the problem of inferring the players' objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates describes the follower's objective function. Instead of using passively observed trajectories like existing methods, the proposed method actively maximizes the differences in the follower's trajectories under different hypotheses to accelerate the leader's inference. We demonstrate the proposed method in a receding-horizon repeated trajectory game. Compared with uniformly random inputs, the leader inputs provided by the proposed method accelerate the convergence of the probability of different hypotheses conditioned on the follower's trajectory by orders of magnitude.


Efficient Constrained Multi-Agent Trajectory Optimization using Dynamic Potential Games

arXiv.org Artificial Intelligence

Although dynamic games provide a rich paradigm for modeling agents' interactions, solving these games for real-world applications is often challenging. Many real-world interactive settings involve general nonlinear state and input constraints that couple agents' decisions with one another. In this work, we develop an efficient and fast planner for interactive trajectory optimization in constrained setups using a constrained game-theoretical framework. Our key insight is to leverage the special structure of agents' objective and constraint functions that are common in multi-agent interactions for fast and reliable planning. More precisely, we identify the structure of agents' cost and constraint functions under which the resulting dynamic game is an instance of a constrained dynamic potential game. Constrained dynamic potential games are a class of games for which instead of solving a set of coupled constrained optimal control problems, a constrained Nash equilibrium, i.e. a Generalized Nash equilibrium, can be found by solving a single constrained optimal control problem. This simplifies constrained interactive trajectory optimization significantly. We compare the performance of our method in a navigation setup involving four planar agents and show that our method is on average 20 times faster than the state-of-the-art. We further provide experimental validation of our proposed method in a navigation setup involving two quadrotors carrying a rigid object while avoiding collisions with two humans.


Distributed Potential iLQR: Scalable Game-Theoretic Trajectory Planning for Multi-Agent Interactions

arXiv.org Artificial Intelligence

In this work, we develop a scalable, local trajectory optimization algorithm that enables robots to interact with other robots. It has been shown that agents' interactions can be successfully captured in game-theoretic formulations, where the interaction outcome can be best modeled via the equilibria of the underlying dynamic game. However, it is typically challenging to compute equilibria of dynamic games as it involves simultaneously solving a set of coupled optimal control problems. Existing solvers operate in a centralized fashion and do not scale up tractably to multiple interacting agents. We enable scalable distributed game-theoretic planning by leveraging the structure inherent in multi-agent interactions, namely, interactions belonging to the class of dynamic potential games. Since equilibria of dynamic potential games can be found by minimizing a single potential function, we can apply distributed and decentralized control techniques to seek equilibria of multi-agent interactions in a scalable and distributed manner. We compare the performance of our algorithm with a centralized interactive planner in a number of simulation studies and demonstrate that our algorithm results in better efficiency and scalability. We further evaluate our method in hardware experiments involving multiple quadcopters.