Agents
Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning
Loftin, Robert, Saha, Aadirupa, Devlin, Sam, Hofmann, Katja
High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings. We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play, as they can only be reached through cooperation between both players. To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games. We demonstrate that these methods can be significantly more sample efficient than their optimistic counterparts.
Refining Labelled Systems for Modal and Constructive Logics with Applications
This thesis introduces the "method of structural refinement", which serves as a means of transforming the relational semantics of a modal and/or constructive logic into an 'economical' proof system by connecting two proof-theoretic paradigms: labelled and nested sequent calculi. The formalism of labelled sequents has been successful in that cut-free calculi in possession of desirable proof-theoretic properties can be automatically generated for large classes of logics. Despite these qualities, labelled systems make use of a complicated syntax that explicitly incorporates the semantics of the associated logic, and such systems typically violate the subformula property to a high degree. By contrast, nested sequent calculi employ a simpler syntax and adhere to a strict reading of the subformula property, making such systems useful in the design of automated reasoning algorithms. However, the downside of the nested sequent paradigm is that a general theory concerning the automated construction of such calculi (as in the labelled setting) is essentially absent, meaning that the construction of nested systems and the confirmation of their properties is usually done on a case-by-case basis. The refinement method connects both paradigms in a fruitful way, by transforming labelled systems into nested (or, refined labelled) systems with the properties of the former preserved throughout the transformation process. To demonstrate the method of refinement and some of its applications, we consider grammar logics, first-order intuitionistic logics, and deontic STIT logics. The introduced refined labelled calculi will be used to provide the first proof-search algorithms for deontic STIT logics. Furthermore, we employ our refined labelled calculi for grammar logics to show that every logic in the class possesses the effective Lyndon interpolation property.
Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training
Sharma, Piyush K., Fernandez, Rolando, Zaroukian, Erin, Dorothy, Michael, Basak, Anjon, Asher, Derrik E.
Much work has been dedicated to the exploration of Multi-Agent Reinforcement Learning (MARL) paradigms implementing a centralized learning with decentralized execution (CLDE) approach to achieve human-like collaboration in cooperative tasks. Here, we discuss variations of centralized training and describe a recent survey of algorithmic approaches. The goal is to explore how different implementations of information sharing mechanism in centralized learning may give rise to distinct group coordinated behaviors in multi-agent systems performing cooperative tasks.
RSO: A Novel Reinforced Swarm Optimization Algorithm for Feature Selection
Basak, Hritam, Das, Mayukhmali, Modak, Susmita
Swarm optimization algorithms are widely used for feature selection before data mining and machine learning applications. The metaheuristic nature-inspired feature selection approaches are used for single-objective optimization tasks, though the major problem is their frequent premature convergence, leading to weak contribution to data mining. In this paper, we propose a novel feature selection algorithm named Reinforced Swarm Optimization (RSO) leveraging some of the existing problems in feature selection. This algorithm embeds the widely used Bee Swarm Optimization (BSO) algorithm along with Reinforcement Learning (RL) to maximize the reward of a superior search agent and punish the inferior ones. This hybrid optimization algorithm is more adaptive and robust with a good balance between exploitation and exploration of the search space. The proposed method is evaluated on 25 widely known UCI datasets containing a perfect blend of balanced and imbalanced data. The obtained results are compared with several other popular and recent feature selection algorithms with similar classifier configurations. The experimental outcome shows that our proposed model outperforms BSO in 22 out of 25 instances (88%). Moreover, experimental results also show that RSO performs the best among all the methods compared in this paper in 19 out of 25 cases (76%), establishing the superiority of our proposed method.
An autonomous system to assemble reconfigurable robotic structures in space
Large space structures, such as telescopes and spacecraft, should ideally be assembled directly in space, as they are difficult or impossible to launch from Earth as a single piece. In several cases, however, assembling these technologies manually in space is either highly expensive or unfeasible. In recent years, roboticists have thus been trying to develop systems that could be used to automatically assemble structures in space. To simplify this assembly process, space structures could have a modular design, which essentially means that they are comprised of different building blocks or modules that can be shifted to create different shapes or forms. Researchers at the German Aerospace Center (DLR) and Technische Universität München (TUM) have recently developed an autonomous planner that could be used to assemble reconfigurable structures directly in space.
Learning User-Interpretable Descriptions of Black-Box AI System Capabilities
Verma, Pulkit, Marpally, Shashank Rao, Srivastava, Siddharth
Several approaches have been developed to answer specific questions that a user may have about an AI system that can plan and act. However, the problems of identifying which questions to ask and that of computing a user-interpretable symbolic description of the overall capabilities of the system have remained largely unaddressed. This paper presents an approach for addressing these problems by learning user-interpretable symbolic descriptions of the limits and capabilities of a black-box AI system using low-level simulators. It uses a hierarchical active querying paradigm to generate questions and to learn a user-interpretable model of the AI system based on its responses. In contrast to prior work, we consider settings where imprecision of the user's conceptual vocabulary precludes a direct expression of the agent's capabilities. Furthermore, our approach does not require assumptions about the internal design of the target AI system or about the methods that it may use to compute or learn task solutions. Empirical evaluation on several game-based simulator domains shows that this approach can efficiently learn symbolic models of AI systems that use a deterministic black-box policy in fully observable scenarios.
The Who in Explainable AI: How AI Background Shapes Perceptions of AI Explanations
Ehsan, Upol, Passi, Samir, Liao, Q. Vera, Chan, Larry, Lee, I-Hsiang, Muller, Michael, Riedl, Mark O.
Explainability of AI systems is critical for users to take informed actions and hold systems accountable. While "opening the opaque box" is important, understanding who opens the box can govern if the Human-AI interaction is effective. In this paper, we conduct a mixed-methods study of how two different groups of whos--people with and without a background in AI--perceive different types of AI explanations. These groups were chosen to look at how disparities in AI backgrounds can exacerbate the creator-consumer gap. We quantitatively share what the perceptions are along five dimensions: confidence, intelligence, understandability, second chance, and friendliness. Qualitatively, we highlight how the AI background influences each group's interpretations and elucidate why the differences might exist through the lenses of appropriation and cognitive heuristics. We find that (1) both groups had unwarranted faith in numbers, to different extents and for different reasons, (2) each group found explanatory values in different explanations that went beyond the usage we designed them for, and (3) each group had different requirements of what counts as humanlike explanations. Using our findings, we discuss potential negative consequences such as harmful manipulation of user trust and propose design interventions to mitigate them. By bringing conscious awareness to how and why AI backgrounds shape perceptions of potential creators and consumers in XAI, our work takes a formative step in advancing a pluralistic Human-centered Explainable AI discourse.
Packet Routing with Graph Attention Multi-agent Reinforcement Learning
Mai, Xuan, Fu, Quanzhi, Chen, Yi
Packet routing is a fundamental problem in communication networks that decides how the packets are directed from their source nodes to their destination nodes through some intermediate nodes. With the increasing complexity of network topology and highly dynamic traffic demand, conventional model-based and rule-based routing schemes show significant limitations, due to the simplified and unrealistic model assumptions, and lack of flexibility and adaption. Adding intelligence to the network control is becoming a trend and the key to achieving high-efficiency network operation. In this paper, we develop a model-free and data-driven routing strategy by leveraging reinforcement learning (RL), where routers interact with the network and learn from the experience to make some good routing configurations for the future. Considering the graph nature of the network topology, we design a multi-agent RL framework in combination with Graph Neural Network (GNN), tailored to the routing problem. Three deployment paradigms, centralized, federated, and cooperated learning, are explored respectively. Simulation results demonstrate that our algorithm outperforms some existing benchmark algorithms in terms of packet transmission delay and affordable load.
Towards Emotion-Aware Agents For Negotiation Dialogues
Chawla, Kushal, Clever, Rene, Ramirez, Jaysa, Lucas, Gale, Gratch, Jonathan
Negotiation is a complex social interaction that encapsulates emotional encounters in human decision-making. Virtual agents that can negotiate with humans are useful in pedagogy and conversational AI. To advance the development of such agents, we explore the prediction of two important subjective goals in a negotiation - outcome satisfaction and partner perception. Specifically, we analyze the extent to which emotion attributes extracted from the negotiation help in the prediction, above and beyond the individual difference variables. We focus on a recent dataset in chat-based negotiations, grounded in a realistic camping scenario. We study three degrees of emotion dimensions - emoticons, lexical, and contextual by leveraging affective lexicons and a state-of-the-art deep learning architecture. Our insights will be helpful in designing adaptive negotiation agents that interact through realistic communication interfaces.
Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent
Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, Sharma, Piyush K.
Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.