AITopics

2006.04471

Country:

Europe > United Kingdom > England > North Yorkshire > York (0.04)
Europe > United Kingdom > England > Essex (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hernandez, Daniel, Gbadomosi, Charles Takashi Toyin, Goodman, James, Walker, James Alfred

Metagame Autobalancing for Competitive Multiplayer Games

arXiv.org Artificial IntelligenceJun-8-2020

Automated game balancing has often focused on single-agent scenarios. In this paper we present a tool for balancing multi-player games during game design. Our approach requires a designer to construct an intuitive graphical representation of their meta-game target, representing the relative scores that high-level strategies (or decks, or character types) should experience. This permits more sophisticated balance targets to be defined beyond a simple requirement of equal win chances. We then find a parameterization of the game that meets this target using simulation-based optimization to minimize the distance to the target graph. We show the capabilities of this tool on examples inheriting from Rock-Paper-Scissors, and on a more complex asymmetric fighting game.

artificial intelligence, machine learning, optimization problem, (19 more...)

2006.04419

Country:

Europe > United Kingdom > England > North Yorkshire > York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Artificial IntelligenceJun-8-2020

Integer Programming for Multi-Robot Planning: A Column Generation Approach

Haghani, Naveed, Li, Jiaoyang, Koenig, Sven, Kunapuli, Gautam, Contardo, Claudio, Yarkony, Julian

In this paper, we tackle multi-robot planning (MRP), which aims to route a fleet of robots in a warehouse so as to achieve the maximum reward in a limited amount of time, while not having the robots collide and obeying the constraints of individual robots. In MRP, individual robots may make multiple trips over a given time window and may carry multiple items on each trip. We optimize the efficiency of the warehouse, not the makespan, since we expect new orders to be continuously added. Our contributions are that (1) we adapt the integer linear programming (ILP) formulation and column generation (CG) approach for (prize collecting) vehicle routing (Desrochers et al. 1992, Stenger et al. 2013) to MRP and (2) adapt the seminal work of (Boland et al. 2017) to permit efficient optimization by avoiding consideration of every time increment. Routing problems for a fleet of robots in a warehouse are often treated as Multi-Agent Pathfinding problems (MAPF) (Stern et al. 2019).

artificial intelligence, optimization problem, robot, (17 more...)

2006.04856

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Industry: Transportation > Freight & Logistics Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Sultana, Nazneen N, Meisheri, Hardik, Baniwal, Vinita, Nath, Somjit, Ravindran, Balaraman, Khadilkar, Harshad

Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (ii) we consider a multi-node supply chain consisting of a warehouse which supplies three stores, (iii) the warehouse, stores, and transportation from warehouse to stores have finite capacities, (iv) warehouse and store replenishment happen at different time scales and with realistic time lags, and (v) demand for products at the stores is stochastic. We describe a novel formulation in a multi-agent (hierarchical) reinforcement learning framework that can be used for parallelised decision-making, and use the advantage actor critic (A2C) algorithm with quantised action spaces to solve the problem. Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.

ground transportation, survey article, warehouse, (21 more...)

2006.04037

Country:

Asia > India (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Retail (1.00)
Energy > Oil & Gas (0.46)
Transportation > Infrastructure & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Learning Behaviors with Uncertain Human Feedback

He, Xu, Chen, Haipeng, An, Bo

Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2006.04201

Country:

Asia > Singapore (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Kang, Yipeng, Wang, Tonghan, de Melo, Gerard

Incorporating Pragmatic Reasoning Communication into Emergent Language

Emergentism and pragmatics are two research fields that study the dynamics of linguistic communication along substantially different timescales and intelligence levels. From the perspective of multi-agent reinforcement learning, they correspond to stochastic games with reinforcement training and stage games with opponent awareness. Given that their combination has been explored in linguistics, we propose computational models that combine short-term mutual reasoning-based pragmatics with long-term language emergentism. We explore this for agent communication referential games as well as in Starcraft II, assessing the relative merits of different kinds of mutual reasoning pragmatics models both empirically and theoretically. Our results shed light on their importance for making inroads towards getting more natural, accurate, robust, fine-grained, and succinct utterances.

artificial intelligence, machine learning, natural language, (15 more...)

2006.04109

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Angelov, Daniel, Hristov, Yordan, Ramamoorthy, Subramanian

From Demonstrations to Task-Space Specifications: Using Causal Analysis to Extract Rule Parameterization from Demonstrations

Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human-robot interaction. In this work, we show that it is possible to learn generative models for distinct user behavioural types, extracted from human demonstrations, by enforcing clustering of preferred task solutions within the latent space. We use these models to differentiate between user types and to find cases with overlapping solutions. Moreover, we can alter an initially guessed solution to satisfy the preferences that constitute a particular user type by backpropagating through the learned differentiable models. An advantage of structuring generative models in this way is that we can extract causal relationships between symbols that might form part of the user's specification of the task, as manifested in the demonstrations. We further parameterize these specifications through constraint optimization in order to find a safety envelope under which motion planning can be performed. We show that the proposed method is capable of correctly distinguishing between three user types, who differ in degrees of cautiousness in their motion, while performing the task of moving objects with a kinesthetically driven robot in a tabletop environment. Our method successfully identifies the correct type, within the specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL baseline. We also show that our proposed method correctly changes a default trajectory to one satisfying a particular user specification even with unseen objects. The resulting trajectory is shown to be directly implementable on a PR2 humanoid robot completing the same task.

machine learning, reinforcement learning, trajectory, (18 more...)

2006.113

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Iqbal, Shariq, de Witt, Christian A. Schroeder, Peng, Bei, Böhmer, Wendelin, Whiteson, Shimon, Sha, Fei

AI-QMIX: Attention and Imagination for Dynamic Multi-Agent Reinforcement Learning

Real world multi-agent tasks often involve varying types and quantities of agents and non-agent entities. Agents frequently do not know a priori how many other agents and non-agent entities they will need to interact with in order to complete a given task, requiring agents to generalize across a combinatorial number of task configurations with each potentially requiring different strategies. In this work, we tackle the problem of multi-agent reinforcement learning (MARL) in such dynamic scenarios. We hypothesize that, while the optimal behaviors in these scenarios with varying quantities and types of agents/entities are diverse, they may share common patterns within sub-teams of agents that are combined to form team behavior. As such, we propose a method that can learn these subgroup relationships and how they can be combined, ultimately improving knowledge sharing and generalization across scenarios. This method, Attentive-Imaginative QMIX, extends QMIX for dynamic MARL in two ways: 1) an attention mechanism that enables model sharing across variable sized scenarios and 2) a training objective that improves learning across scenarios with varying combinations of agent/entity types by factoring the value function into imagined sub-scenarios. We validate our approach on both a novel grid-world task as well as a version of the StarCraft Multi-Agent Challenge [28] minimally modified for the dynamic scenario setting.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.04222

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-7-2020

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Zeng, Sihan, Anwar, Aqeel, Doan, Thinh, Romberg, Justin, Raychowdhury, Arijit

We develop a mathematical framework for solving multi-task reinforcement learning problems based on a type of decentralized policy gradient method. The goal in multi-task reinforcement learning is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state and action spaces, but have different rewards and dynamics. Agents immersed in each of these environments communicate with other agents by sharing their models (i.e. their policy parameterizations) but not their state/reward paths. Our analysis provides a convergence rate for a consensus-based distributed, entropy-regularized policy gradient method for finding such a policy. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "Grid World" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to play multiple Atari games or to navigate an airborne drone in multiple (simulated) environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.04338

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-6-2020

Learning to Model Opponent Learning

Davies, Ian, Tian, Zheng, Wang, Jun

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2006.03923

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)