Goto

Collaborating Authors

 Agents


Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

arXiv.org Artificial Intelligence

Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD). At the trajectory distribution level, we re-define BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning. We validate our methods in both relatively simple games like matrix game, non-transitive mixture model, and the complex \textit{Google Research Football} environment. The population found by our methods reveals the lowest exploitability, highest population effectivity in matrix game and non-transitive mixture model, as well as the largest goal difference when interacting with opponents of various levels in \textit{Google Research Football}.


Learning by Watching

arXiv.org Artificial Intelligence

When in a new situation or geographical location, human drivers have an extraordinary ability to watch others and learn maneuvers that they themselves may have never performed. In contrast, existing techniques for learning to drive preclude such a possibility as they assume direct access to an instrumented ego-vehicle with fully known observations and expert driver actions. However, such measurements cannot be directly accessed for the non-ego vehicles when learning by watching others. Therefore, in an application where data is regarded as a highly valuable asset, current approaches completely discard the vast portion of the training data that can be potentially obtained through indirect observation of surrounding vehicles. Motivated by this key insight, we propose the Learning by Watching (LbW) framework which enables learning a driving policy without requiring full knowledge of neither the state nor expert actions. To increase its data, i.e., with new perspectives and maneuvers, LbW makes use of the demonstrations of other vehicles in a given scene by (1) transforming the ego-vehicle's observations to their points of view, and (2) inferring their expert actions. Our LbW agent learns more robust driving policies while enabling data-efficient learning, including quick adaptation of the policy to rare and novel scenarios. In particular, LbW drives robustly even with a fraction of available driving data required by existing methods, achieving an average success rate of 92% on the original CARLA benchmark with only 30 minutes of total driving data and 82% with only 10 minutes.


Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions

arXiv.org Artificial Intelligence

In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents' actions posed significant difficulties for an agent to learn a good policy independently. One way to deal with non-stationarity is agent modeling, by which the agent takes into consideration the influence of other agents' policies. Most existing work relies on predicting other agents' actions or goals, or discriminating between their policies. However, such modeling fails to capture the similarities and differences between policies simultaneously and thus cannot provide useful information when generalizing to unseen policies. To address this, we propose a general method to learn representations of other agents' policies via the joint-action distributions sampled in interactions. The similarities and differences between policies are naturally captured by the policy distance inferred from the joint-action distributions and deliberately reflected in the learned representations. Agents conditioned on the policy representations can well generalize to unseen agents. We empirically demonstrate that our method outperforms existing work in multi-agent tasks when facing unseen agents.


Fairness for Cooperative Multi-Agent Learning with Equivariant Policies

arXiv.org Artificial Intelligence

We study fairness through the lens of cooperative multi-agent learning. Our work is motivated by empirical evidence that naive maximization of team reward yields unfair outcomes for individual team members. To address fairness in multi-agent contexts, we introduce team fairness, a group-based fairness measure for multi-agent learning. We then incorporate team fairness into policy optimization -- introducing Fairness through Equivariance (Fair-E), a novel learning strategy that achieves provably fair reward distributions. We then introduce Fairness through Equivariance Regularization (Fair-ER) as a soft-constraint version of Fair-E and show that Fair-ER reaches higher levels of utility than Fair-E and fairer outcomes than policies with no equivariance. Finally, we investigate the fairness-utility trade-off in multi-agent settings.


Swarm Intelligence for Self-Organized Clustering

arXiv.org Machine Learning

Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and/or density-based structures in the data space. By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering. The swarm omits the usage of a global objective function and is parameter-free because it searches for the Nash equilibrium during its annealing process. To our knowledge, DBS is the first swarm combining these approaches. Its clustering can outperform common clustering methods such as K-means, PAM, single linkage, spectral clustering, model-based clustering, and Ward, if no prior knowledge about the data is available. A central problem in clustering is the correct estimation of the number of clusters. This is addressed by a DBS visualization called topographic map which allows assessing the number of clusters. It is known that all clustering algorithms construct clusters, irrespective of the data set contains clusters or not. In contrast to most other clustering algorithms, the topographic map identifies, that clustering of the data is meaningless if the data contains no (natural) clusters. The performance of DBS is demonstrated on a set of benchmark data, which are constructed to pose difficult clustering problems and in two real-world applications.


Theoretical Modeling of Communication Dynamics

arXiv.org Machine Learning

Communication is a cornerstone of social interactions, be it with human or artificial intelligence (AI). Yet it can be harmful, depending on the honesty of the exchanged information. To study this, an agent based sociological simulation framework is presented, the reputation game. This illustrates the impact of different communication strategies on the agents' reputation. The game focuses on the trustworthiness of the participating agents, their honesty as perceived by others. In the game, each agent exchanges statements with the others about their own and each other's honesty, which lets their judgments evolve. Various sender and receiver strategies are studied, like sycophant, egocentricity, pathological lying, and aggressiveness for senders as well as awareness and lack thereof for receivers. Minimalist malicious strategies are identified, like being manipulative, dominant, or destructive, which significantly increase reputation at others' costs. Phenomena such as echo chambers, self-deception, deception symbiosis, clique formation, freezing of group opinions emerge from the dynamics. This indicates that the reputation game can be studied for complex group phenomena, to test behavioral hypothesis, and to analyze AI influenced social media. With refined rules it may help to understand social interactions, and to safeguard the design of non-abusive AI systems.


Cooperative Online Learning

arXiv.org Machine Learning

In this preliminary (and unpolished) version of the paper, we study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. Some feedback is then revealed to these agents and is later propagated through the network. We consider the case of full, bandit, and semi-bandit feedback. In particular, we construct a reduction to delayed single-agent learning that applies to both the full and the bandit feedback case and allows to obtain regret guarantees for both settings. We complement these results with a near-matching lower bound.


A 2020 taxonomy of algorithms inspired on living beings behavior

arXiv.org Artificial Intelligence

Since the emerge of ideas about simulation of life in last decades, several algorithms have been proposed to solve complex problems inspired on nature phenomena; i.e. evolutionary computation or artificial life. A role of a naturalist or biologist is taken with the purpose for studying all living forms in a new ecosystem and trying to make a classification of all discoveries to form a taxonomy of living beings. This role is taken as a computer naturalist to make a compilation of algorithms inspired on behavior of living beings. There are several bio-inspired algorithms; however, this work focus on actions of living beings like the growth of plants, reproduction of mushrooms, living of bacteria, the individuals behavior of animals, etc.; however, highlights the interactions between individuals of a group of different animals like school of fishes, flock of birds, herd of mammals, or swarm of insects. Focusing on algorithms inspired in actions of living beings that belongs to any kingdom of the nature; nevertheless, it is important to locate all algorithms as possible. Only basic algorithms are considered, but derivations, variants and hybrids are omitted; at least, algorithms which involves an inspiration of any living being. Location of bio-inspired algorithms related with a specific species is made by a review of several papers of surveys which involve nature bio-inspired, swarm intelligence, and metaheuristics algorithms; however, several of these surveys consider different points of view. It was consider only survey papers from ten years old ago because it is expected a more complete reviews since then. Surveys span in many cases all kind of algorithms; however many of them have been proposed recently; it maybe because the year 2020 is iconic.


Defining definition: a Text mining Approach to Define Innovative Technological Fields

arXiv.org Machine Learning

One of the first task of an innovative project is delineating the scope of the project itself or of the product/service to be developed. A wrong scope definition can determine (in the worst case) project failure. A good scope definition become even more relevant in technological intensive innovation projects, nowadays characterized by a highly dynamic multidisciplinary, turbulent and uncertain environment. In these cases, the boundaries of the project are not easily detectable and it is difficult to decide what it is in-scope and out-of-scope. The present work proposes a tool for the scope delineation process, that automatically define an innovative technological field or a new technology. The tool is based on Text Mining algorithm that exploits Elsevier's Scopus abstracts in order to the extract relevant data to define a technological scope. The automatic definition tool is then applied on four case studies: Artificial Intelligence and Data Science. The results show how the tool can provide many crucial information in the definition process of a technological field. In particular for the target technological field (or technology), it provides the definition and other elements related to the target.


North Carolina COVID-19 Agent-Based Model Framework for Hospitalization Forecasting Overview, Design Concepts, and Details Protocol

arXiv.org Artificial Intelligence

This Overview, Design Concepts, and Details Protocol (ODD) provides a detailed description of an agent-based model (ABM) that was developed to simulate hospitalizations during the COVID-19 pandemic. Using the descriptions of submodels, provided parameters, and the links to data sources, modelers will be able to replicate the creation and results of this model.