Agents
Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms
de Nijs, Frits | Walraven, Erwin (Delft University of Technology) | De Weerdt, Mathijs (Delft University of Technology) | Spaan, Matthijs (Delft University of Technology)
In domains such as electric vehicle charging, smart distribution grids and autonomous warehouses, multiple agents share the same resources. When planning the use of these resources, agents need to deal with the uncertainty in these domains. Although several models and algorithms for such constrained multiagent planning problems under uncertainty have been proposed in the literature, it remains unclear when which algorithm can be applied. In this survey we conceptualize these domains and establish a generic problem class based on Markov decision processes. We identify and compare the conditions under which algorithms from the planning literature for problems in this class can be applied: whether constraints are soft or hard, whether agents are continuously connected, whether the domain is fully observable, whether a constraint is momentarily (instantaneous) or on a budget, and whether the constraint is on a single resource or on multiple. Further we discuss the advantages and disadvantages of these algorithms. We conclude by identifying open problems that are directly related to the conceptualized domains, as well as in adjacent research areas.
Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation
Dubey, Abhimanyu, Pentland, Alex
Cooperative multi-agent reinforcement learning (MARL) systems are widely prevalent in many engineering systems, e.g., robotic systems (Ding et al., 2020), power grids (Yu et al., 2014), traffic control (Bazzan, 2009), as well as team games (Zhao et al., 2019). Increasingly, federated (Yang et al., 2019) and distributed (Peteiro-Barral & Guijarro-Berdiñas, 2013) machine learning is gaining prominence in industrial applications, and reinforcement learning in these large-scale settings is becoming of import in the research community as well (Zhuo et al., 2019; Liu et al., 2019). Recent research in the statistical learning community has focused on cooperative multi-agent decision-making algorithms with provable guarantees(Zhang et al., 2018b; Wai et al., 2018; Zhang et al., 2018a). However, prior work focuses on algorithms that, while are decentralized, provide guarantees on convergence (e.g., Zhang et al. (2018b)) but no finite-sample guarantees for regret, in contrast to efficient algorithms with function approximation proposed for single-agent RL (e.g., Jin et al. (2018, 2020); Yang et al. (2020)). Moreover, optimization in the decentralized multi-agent setting is also known to be non-convergent without assumptions (Tan, 1993). Developing no-regret multi-agent algorithms is therefore an important problem in RL. For the (relatively) easier problem of multi-agent multi-armed bandits, there has been significant recent interest in decentralized algorithms involving agents communicating over a network (Landgren et al., 2016a, 2018; Martínez-Rubio et al., 2019; Dubey & Pentland, 2020b), as well as in the distributed settings (Hillel et al., 2013; Wang et al., 2019). Since several application areas for distributed sequential decision-making regularly involve non-stationarity and contextual information (Polydoros & Nalpantidis, 2017), an MDP formulation can potentially provide stronger algorithms for these settings as well. Furthermore, no-regret algorithms in the single-agent RL setting with function approximation (e.g., Jin et al. (2020)) build on analysis techniques for contextual bandits, which leads us to the question - Can no-regret function approximation be extended to (decentralized) cooperative multi-agent reinforcement learning?
The AI Index 2021 Annual Report
Zhang, Daniel, Mishra, Saurabh, Brynjolfsson, Erik, Etchemendy, John, Ganguli, Deep, Grosz, Barbara, Lyons, Terah, Manyika, James, Niebles, Juan Carlos, Sellitto, Michael, Shoham, Yoav, Clark, Jack, Perrault, Raymond
Welcome to the fourth edition of the AI Index Report. This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford Institute for Human-Centered Artificial Intelligence (HAI). The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop intuitions about the complex field of AI. The report aims to be the most credible and authoritative source for data and insights about AI in the world.
Domain-Robust Visual Imitation Learning with Mutual Information Constraints
Cetin, Edoardo, Celiktutan, Oya
Human beings are able to understand objectives and learn by simply observing others perform a task. Imitation learning methods aim to replicate such capabilities, however, they generally depend on access to a full set of optimal states and actions taken with the agent's actuators and from the agent's point of view. In this paper, we introduce a new algorithm - called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL) - with the purpose of bypassing such constraints. Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task, by making use of adversarial learning with a latent representation inside the discriminator network. Such latent representation is regularized through mutual information constraints to incentivize learning only features that encode information about the completion levels of the task being demonstrated. This allows to obtain a shared feature space to successfully perform imitation while disregarding the differences between the expert's and the agent's domains. Empirically, our algorithm is able to efficiently imitate in a diverse range of control problems including balancing, manipulation and locomotive tasks, while being robust to various domain differences in terms of both environment appearance and agent embodiment.
Safe Multi-Agent Pathfinding with Time Uncertainty
Shahar, Tomer (Ben Gurion University of the Negev) | Shekhar, Shashank (Ben Gurion University of the Negev) | Atzmon, Dor (Ben Gurion University of the Negev) | Saffidine, Abdallah (The University of New South Wales, Sydney, Australia) | Juba, Brendan (Washington University in St. Louis, USA) | Stern, Roni
In many real-world scenarios, the time it takes for a mobile agent, e.g., a robot, to move from one location to another may vary due to exogenous events and be difficult to predict accurately. Planning in such scenarios is challenging, especially in the context of Multi-Agent Pathfinding (MAPF), where the goal is to find paths to multiple agents and temporal coordination is necessary to avoid collisions. In this work, we consider a MAPF problem with this form of time uncertainty, where we are only given upper and lower bounds on the time it takes each agent to move. The objective is to find a safe solution, which is a solution that can be executed by all agents and is guaranteed to avoid collisions. We propose two complete and optimal algorithms for finding safe solutions based on well-known MAPF algorithms, namely, A* with Operator Decomposition (A* + OD) and Conflict-Based Search (CBS). Experimentally, we observe that on several standard MAPF grids the CBS-based algorithm performs better. We also explore the option of online replanning in this context, i.e., modifying the agents' plans during execution, to reduce the overall execution cost. We consider two online settings: (a) when an agent can sense the current time and its current location, and (b) when the agents can also communicate seamlessly during execution. For each setting, we propose a replanning algorithm and analyze its behavior theoretically and empirically. Our experimental evaluation confirms that indeed online replanning in both settings can significantly reduce solution cost.
Embodied Continual Learning Across Developmental Time Via Developmental Braitenberg Vehicles
Alicea, Bradly, Chakrabarty, Rishabh, Gopi, Akshara, Lim, Anson, Parent, Jesse
Bradly Alicea, Rishabh Chakrabarty, Akshara Gopi, Anson Lim, and Jesse Parent Abstract There is much to learn through synthesis of Developmental Biology, Cognitive Science and Computational Modeling. One lesson we can learn from this perspective is that the initialization of intelligent programs cannot solely rely on manipulation of numerous parameters. Our path forward is to present a design for developmentally-inspired learning agents based on the Braitenberg Vehicle. Using these agents to exemplify artificial embodied intelligence, we move closer to modeling embodied experience and morphogenetic growth as components of cognitive developmental capacity. We consider various factors regarding biological and cognitive development which influence the generation of adult phenotypes and the contingency of available developmental pathways. These mechanisms produce emergent connectivity with shifting weights and adaptive network topography, thus illustrating the importance of developmental processes in training neural networks. This approach provides a blueprint for adaptive agent behavior that might result from a developmental approach: namely by exploiting critical periods or growth and acquisition, an explicitly embodied network architecture, and a distinction between the assembly of neural networks and active learning on these networks. Introduction The process of biological development provides many novel lessons for machine learning and artificial intelligence.
The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data
Gu, Tu, Feng, Kaiyu, Cong, Gao, Long, Cheng, Wang, Zheng, Wang, Sheng
Despite the success of these learned indices in improving the performance Learned indices have been proposed to replace classic index structures of some types of queries, they still have various limitations, like B-Tree with machine learning (ML) models. They require e.g., they can only handle spatial point objects and limited types to replace both the indices and query processing algorithms currently of spatial queries, some only return approximate query results, deployed by the databases, and such a radical departure is and they either cannot handle updates or need a periodic rebuild likely to encounter challenges and obstacles. In contrast, we propose to retain high query efficiency (Detailed discussions are in Section a fundamentally different way of using ML techniques to 2). These limitations, together with the requirement that the improve on the query performance of the classic R-Tree without learned indices need a replacement of the index structures and the need of changing its structure or query processing algorithms.
Loosely Synchronized Search for Multi-agent Path Finding with Asynchronous Actions
Ren, Zhongqiang, Rathinam, Sivakumar, Choset, Howie
Multi-agent path finding (MAPF) determines an ensemble of collision-free paths for multiple agents between their respective start and goal locations. Among the available MAPF planners for workspaces modeled as a graph, A*-based approaches have been widely investigated and have demonstrated their efficiency in numerous scenarios. However, almost all of these A*-based approaches assume that each agent executes an action concurrently in that all agents start and stop together. This article presents a natural generalization of MAPF with asynchronous actions where agents do not necessarily start and stop concurrently. The main contribution of the work is a proposed approach called Loosely Synchronized Search (LSS) that extends A*-based MAPF planners to handle asynchronous actions. We show LSS is complete and finds an optimal solution if one exists. We also combine LSS with other existing MAPF methods that aims to trade-off optimality for computational efficiency. Extensive numerical results are presented to corroborate the performance of the proposed approaches. Finally, we also verify the applicability of our method in the Robotarium, a remotely accessible swarm robotics research platform.
Let's be friends! A rapport-building 3D embodied conversational agent for the Human Support Robot
Pasternak, Katarzyna, Wu, Zishi, Visser, Ubbo, Lisetti, Christine
Partial subtle mirroring of nonverbal behaviors during conversations (also known as mimicking or parallel empathy), is essential for rapport building, which in turn is essential for optimal human-human communication outcomes. Mirroring has been studied in interactions between robots and humans, and in interactions between Embodied Conversational Agents (ECAs) and humans. However, very few studies examine interactions between humans and ECAs that are integrated with robots, and none of them examine the effect of mirroring nonverbal behaviors in such interactions. Our research question is whether integrating an ECA able to mirror its interlocutor's facial expressions and head movements (continuously or intermittently) with a human-service robot will improve the user's experience with the support robot that is able to perform useful mobile manipulative tasks (e.g. at home). Our contribution is the complex integration of an expressive ECA, able to track its interlocutor's face, and to mirror his/her facial expressions and head movements in real time, integrated with a human support robot such that the robot and the agent are fully aware of each others', and of the users', nonverbals cues. We also describe a pilot study we conducted towards answering our research question, which shows promising results for our forthcoming larger user study.
Adaptive Agent Architecture for Real-time Human-Agent Teaming
Ni, Tianwei, Li, Huao, Agrawal, Siddharth, Raja, Suhas, Jia, Fan, Gui, Yikang, Hughes, Dana, Lewis, Michael, Sycara, Katia
Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human teams. To optimize team performance in human-agent teaming, it is critical that agents infer human intent and adapt their polices for smooth coordination. Most literature in human-agent teaming builds agents referencing a learned human model. Though these agents are guaranteed to perform well with the learned model, they lay heavy assumptions on human policy such as optimality and consistency, which is unlikely in many real-world scenarios. In this paper, we propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game, namely Team Space Fortress (TSF). Previous human-human team research have shown complementary policies in TSF game and diversity in human players' skill, which encourages us to relax the assumptions on human policy. Therefore, we discard learning human models from human data, and instead use an adaptation strategy on a pre-trained library of exemplar policies composed of RL algorithms or rule-based methods with minimal assumptions of human behavior. The adaptation strategy relies on a novel similarity metric to infer human policy and then selects the most complementary policy in our library to maximize the team performance. The adaptive agent architecture can be deployed in real-time and generalize to any off-the-shelf static agents. We conducted human-agent experiments to evaluate the proposed adaptive agent framework, and demonstrated the suboptimality, diversity, and adaptability of human policies in human-agent teams.