Agents
Dynamic Awareness
Halpern, Joseph Y., Piermont, Evan
Karni and Vierรธ's requirement does Modica and Rustichini 1994; Modica and Rustichini 1999; not seem appropriate for many situations of interest, especially Heifetz, Meier, and Schipper 2006; Board and Chung 2009; for introspective agents, who may believe that the Sillari 2008).) Most work on awareness thus far has focused mere existence of ฯ is itself informative about the world-- on the static case, where awareness does not change. The so becoming aware of ฯ changes beliefs about other propositions.
Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams
Multi agent strategies in mixed cooperative-competitive environments can be hard to craft by hand because each agent needs to coordinate with its teammates while competing with its opponents. Learning based algorithms are appealing but many scenarios require heterogeneous agent behavior for the team's success and this increases the complexity of the learning algorithm. In this work, we develop a competitive multi agent environment called FortAttack in which two teams compete against each other. We corroborate that modeling agents with Graph Neural Networks and training them with Reinforcement Learning leads to the evolution of increasingly complex strategies for each team. We observe a natural emergence of heterogeneous behavior amongst homogeneous agents when such behavior can lead to the team's success. Such heterogeneous behavior from homogeneous agents is appealing because any agent can replace the role of another agent at test time. Finally, we propose ensemble training, in which we utilize the evolved opponent strategies to train a single policy for friendly agents.
Policy learning with partial observation and mechanical constraints for multi-person modeling
Fujii, Keisuke, Takeishi, Naoya, Kawahara, Yoshinobu, Takeda, Kazuya
Extracting the rules of real-world biological multi-agent behaviors is a current challenge in various scientific and engineering fields. Biological agents generally have limited observation and mechanical constraints; however, most of the conventional data-driven models ignore such assumptions, resulting in lack of biological plausibility and model interpretability for behavioral analyses in biological and cognitive science. Here we propose sequential generative models with partial observation and mechanical constraints, which can visualize whose information the agents utilize and can generate biologically plausible actions. We formulate this as a decentralized multi-agent imitation learning problem, leveraging binary partial observation models with a Gumbel-Softmax reparameterization and policy models based on hierarchical variational recurrent neural networks with physical and biomechanical constraints. We investigate the empirical performances using real-world multi-person motion datasets from basketball and soccer games.
The Sample Complexity of Best-$k$ Items Selection from Pairwise Comparisons
Ren, Wenbo, Liu, Jia, Shroff, Ness B.
This paper studies the sample complexity (aka number of comparisons) bounds for the active best-$k$ items selection from pairwise comparisons. From a given set of items, the learner can make pairwise comparisons on every pair of items, and each comparison returns an independent noisy result about the preferred item. At any time, the learner can adaptively choose a pair of items to compare according to past observations (i.e., active learning). The learner's goal is to find the (approximately) best-$k$ items with a given confidence, while trying to use as few comparisons as possible. In this paper, we study two problems: (i) finding the probably approximately correct (PAC) best-$k$ items and (ii) finding the exact best-$k$ items, both under strong stochastic transitivity and stochastic triangle inequality. For PAC best-$k$ items selection, we first show a lower bound and then propose an algorithm whose sample complexity upper bound matches the lower bound up to a constant factor. For the exact best-$k$ items selection, we first prove a worst-instance lower bound. We then propose two algorithms based on our PAC best items selection algorithms: one works for $k=1$ and is sample complexity optimal up to a loglog factor, and the other works for all values of $k$ and is sample complexity optimal up to a log factor.
Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
Perrin, Sarah, Perolat, Julien, Lauriรจre, Mathieu, Geist, Matthieu, Elie, Romuald, Pietquin, Olivier
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise.
Reward Machines for Cooperative Multi-Agent Reinforcement Learning
Neary, Cyrus, Xu, Zhe, Wu, Bo, Topcu, Ufuk
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy two orders of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.
Public Willingness to Get Vaccinated Against COVID-19: How AI-Developed Vaccines Can Affect Acceptance
Lima, Gabriel, Hwang, Hyeyoung, Cha, Chiyoung, Cha, Meeyoung
Vaccines for COVID-19 are currently under clinical trials. These vaccines are crucial for eradicating the novel coronavirus. Despite the potential, there exist conspiracies related to vaccines online, which can lead to vaccination hesitancy and, thus, a longer-standing pandemic. We used a between-subjects study design (N=572 adults in the US and UK) to understand the public willingness towards vaccination against the novel coronavirus under various circumstances. Our survey findings suggest that people are more reluctant to vaccinate their children compared to themselves. Explicitly stating the high effectiveness of the vaccine against COVID-19 led to an increase in vaccine acceptance. Interestingly, our results do not indicate any meaningful variance due to the use of artificial intelligence (AI) in developing vaccines, if these systems are described to be in use alongside human researchers. We discuss the public's expectation of local governments in assuring the safety and effectiveness of a future COVID-19 vaccine.
Smarter Chatbots and Virtual Agents for your Contact Center VoiceFoundry
Conversational AI, Virtual Assistance and the use of Bots are on the rise today. The terminology can be confusing and so it is important to understand the differences in order to determine what is best for your customers. Understanding how customers interact with your business and their preferences for engagement are a must. Businesses are looking for ways to deliver a better conversational approach to meets their customer's needs in this day of fast-paced communication and right-now resolution. Many businesses are increasingly looking to incorporate sophisticated bot communications, which is why VoiceFoundry offers a full suite of services that leverage the power of Amazon solutions like Amazon Connect, Lex, Polly and more in order to deliver a complete experience.
Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints
Andriotis, C. P., Papakonstantinou, K. G.
Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.
Multi-agent Planning for thermalling gliders using multi level graph-search
Zaman, Muhammad Aneeq uz, Bhatti, Aamer Iqbal
This paper solves a path planning problem for a group of gliders. The gliders are tasked with visiting a set of interest points. The gliders have limited range but are able to increase their range by visiting special points called thermals. The problem addressed in this paper is of path planning for the gliders such that, the total number of interest points visited by the gliders is maximized. This is referred to as the multi-agent problem. The problem is solved by first decomposing it into several single-agent problems. In a single-agent problem a set of interest points are allocated to a single glider. This problem is solved by planning a path which maximizes the number of visited interest points from the allocated set. This is achieved through a uniform cost graph search, as shown in our earlier work. The multi-agent problem now consists of determining the best allocation (of interest points) for each glider. Two ways are presented of solving this problem, a brute force search approach as shown in earlier work and a Branch\&Bound type graph search. The Branch&Bound approach is the main contribution of the paper. This approach is proven to be optimal and shown to be faster than the brute force search using simulations.