Agents
Multi-Agent Informational Learning Processes
Terry, Justin K, Grammel, Nathaniel
We introduce a new mathematical model of multi-agent reinforcement learning, the Multi-Agent Informational Learning Processor "MAILP" model. The model is based on the notion that agents have policies for a certain amount of information, models how this information iteratively evolves and propagates through many agents. This model is very general, and the only meaningful assumption made is that learning for individual agents progressively slows over time.
Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent
Zhao, Xiaosa, Liu, Kunpeng, Fan, Wei, Jiang, Lu, Zhao, Xiaowei, Yin, Minghao, Fu, Yanjie
Feature selection aims to select a subset of features to optimize the performances of downstream predictive tasks. Recently, multi-agent reinforced feature selection (MARFS) has been introduced to automate feature selection, by creating agents for each feature to select or deselect corresponding features. Although MARFS enjoys the automation of the selection process, MARFS suffers from not just the data complexity in terms of contents and dimensionality, but also the exponentially-increasing computational costs with regard to the number of agents. The raised concern leads to a new research question: Can we simplify the selection process of agents under reinforcement learning context so as to improve the efficiency and costs of feature selection? To address the question, we develop a single-agent reinforced feature selection approach integrated with restructured choice strategy. Specifically, the restructured choice strategy includes: 1) we exploit only one single agent to handle the selection task of multiple features, instead of using multiple agents. 2) we develop a scanning method to empower the single agent to make multiple selection/deselection decisions in each round of scanning. 3) we exploit the relevance to predictive labels of features to prioritize the scanning orders of the agent for multiple features. 4) we propose a convolutional auto-encoder algorithm, integrated with the encoded index information of features, to improve state representation. 5) we design a reward scheme that take into account both prediction accuracy and feature redundancy to facilitate the exploration process. Finally, we present extensive experimental results to demonstrate the efficiency and effectiveness of the proposed method.
TotalBotWar: A New Pseudo Real-time Multi-action Game Challenge and Competition for AI
Estaben, Alejandro, Dรญaz, Cรฉsar, Montoliu, Raul, Pรฉrez-Liebana, Diego
This paper presents TotalBotWar, a new pseudo real-time multi-action challenge for game AI, as well as some initial experiments that benchmark the framework with different agents. The game is based on the real-time battles of the popular TotalWar games series where players manage an army to defeat the opponent's one. In the proposed game, a turn consists of a set of orders to control the units. The number and specific orders that can be performed in a turn vary during the progression of the game. One interesting feature of the game is that if a particular unit does not receive an order in a turn, it will continue performing the action specified in a previous turn. The turn-wise branching factor becomes overwhelming for traditional algorithms and the partial observability of the game state makes the proposed game an interesting platform to test modern AI algorithms.
AI and Wargaming
Goodman, James, Risi, Sebastian, Lucas, Simon
Recent progress in Game AI has demonstrated that given enough data from human gameplay, or experience gained via simulations, machines can rival or surpass the most skilled human players in classic games such as Go, or commercial computer games such as Starcraft. We review the current state-of-the-art through the lens of wargaming, and ask firstly what features of wargames distinguish them from the usual AI testbeds, and secondly which recent AI advances are best suited to address these wargame-specific features.
Learnable Strategies for Bilateral Agent Negotiation over Multiple Issues
Bagga, Pallavi, Paoletti, Nicola, Stathis, Kostas
We present a novel bilateral negotiation model that allows a self-interested agent to learn how to negotiate over multiple issues in the presence of user preference uncertainty. The model relies upon interpretable strategy templates representing the tactics the agent should employ during the negotiation and learns template parameters to maximize the average utility received over multiple negotiations, thus resulting in optimal bid acceptance and generation. Our model also uses deep reinforcement learning to evaluate threshold utility values, for those tactics that require them, thereby deriving optimal utilities for every environment state. To handle user preference uncertainty, the model relies on a stochastic search to find user model that best agrees with a given partial preference profile. Multi-objective optimization and multi-criteria decision-making methods are applied at negotiation time to generate Pareto-optimal outcomes thereby increasing the number of successful (win-win) negotiations. Rigorous experimental evaluations show that the agent employing our model outperforms the winning agents of the 10th Automated Negotiating Agents Competition (ANAC'19) in terms of individual as well as social-welfare utilities.
Multiagent trajectory models via game theory and implicit layer-based learning
Geiger, Philipp, Straehle, Christoph-Nikolas
For prediction of interacting agents' trajectories, we propose an end-to-end trainable architecture that hybridizes neural nets with game-theoretic reasoning, has interpretable intermediate representations, and transfers to robust downstream decision making. It combines (1) a differentiable implicit layer that maps preferences to local Nash equilibria with (2) a learned equilibrium refinement concept and (3) a learned preference revelation net, given initial trajectories as input. This is accompanied by a new class of continuous potential games. We provide theoretical results for explicit gradients and soundness, and several measures to ensure tractability. In experiments, we evaluate our approach on two real-world data sets, where we predict highway driver merging trajectories, and on a simple decision-making transfer task.
Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.
Global collaboration for a better future and a cleaner planet
We live in a challenging world particularly since the start of the Covid-19 pandemic. Our ways of living, communicating, interacting, purchasing, and working have changed. With every challenge comes great opportunity so I remain extremely optimistic about the outcomes of this crisis. During confinement, we got to know our neighbors better and offered assistance. We saw some great collaboration amongst colleagues.
Persistent And Scalable JADE: A Cloud based InMemory Multi-agent Framework
Khalid, Nauman, Tahir, Ghalib Ahmed, Bloodsworth, Peter
There are several approaches which are used by Java Persistence API [12], [13], [14], Serialization mechanism [13] Multi-agent systems are often limited in terms of persistence, DBMS [14] and JADE Persistence Services [15]. This issue is more prevalent for applications in persistency framework did improve the flexibility and stability which agent states changes frequently. This makes the existing of an agent-based system however it increases the complexity methods less usable as they increase the agent's complexity of agents whose state changes in real-time. In the case when and are less scalable. This research study has presented a a certain object of agent is persisted in the database using novel in-memory agent persistence framework. Two prototypes composite keys, there is an increase in complexity to persist have been implemented, one using the proposed solution and and find the object. Increase in artifact size, framework complexity, the other using an established agent persistency environment. There is also a chance of Virtual Machine instance similar level of persistency. These findings will help future failure which will further increase the recovery time of the real-time multiagent systems to become scalable and persistent agent placed on a single instance due to VM churn time. in a dynamic cloud environment. However, when the data persistence is for a longer period or the state of the object frequently updates, the II. 2. I Moreover, when there is a change in many areas including aircraft maintenance, electronic book any data structure of the agent, the serialized object cannot be buying, network security, military logistic planning and maintaining deserialized even when you know the session id.
Strategy Proof Mechanisms for Facility Location with Capacity Limits
An important feature of many real world facility location problems are capacity limits on the facilities. We show here how capacity constraints make it harder to design strategy proof mechanisms for facility location, but counter-intuitively can improve the guarantees on how well we can approximate the optimal solution.