AITopics

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2005.07513

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Hertweck, Tim, Riedmiller, Martin, Bloesch, Michael, Springenberg, Jost Tobias, Siegel, Noah, Wulfmeier, Markus, Hafner, Roland, Heess, Nicolas

Simple Sensor Intentions for Exploration

Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic systems without expensive instrumentation. In this paper we focus on a setting in which goal tasks are defined via simple sparse rewards, and exploration is facilitated via agent-internal auxiliary tasks. We introduce the idea of simple sensor intentions (SSIs) as a generic way to define auxiliary tasks. SSIs reduce the amount of prior knowledge that is required to define suitable rewards. They can further be computed directly from raw sensor streams and thus do not require expensive and possibly brittle state estimation on real systems. We demonstrate that a learning system based on these rewards can solve complex robotic tasks in simulation and in real world settings. In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition.

experiment, machine learning, reinforcement learning, (18 more...)

2005.07541

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Lifelong Multi-Agent Path Finding in Large-Scale Warehouses

Li, Jiaoyang, Tinka, Andrew, Kiesel, Scott, Durham, Joseph W., Kumar, T. K. Satish, Koenig, Sven

Multi-Agent Path Finding (MAPF) is the problem of moving a team of agents to their goal locations without collisions. In this paper, we study the lifelong variant of MAPF where agents are constantly engaged with new goal locations, such as in large-scale warehouses. We propose a new framework for solving lifelong MAPF by decomposing the problem into a sequence of Windowed MAPF instances, where a Windowed MAPF solver resolves collisions among the paths of the agents only within a finite time horizon and ignores collisions beyond it. Our framework is particularly well suited to generating pliable plans that adapt to continually arriving new goal locations. Theoretically, we analyze the advantages and disadvantages of our framework. Empirically, we evaluate our framework with a variety of MAPF solvers and show that it can produce high-quality solutions for up to 1,000 agents, significantly outperforming existing methods.

agent, artificial intelligence, goal location, (15 more...)

2005.07371

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Cha, Han, Park, Jihong, Kim, Hyesung, Bennis, Mehdi, Kim, Seong-Lyun

Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience replay memory (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated reinforcement learning (FRL), and policy distillation (PD).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2005.06105

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.06)
Asia > South Korea > Seoul > Seoul (0.06)
Europe > Sweden > Stockholm > Stockholm (0.05)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

de Witt, Christian Schroeder, Gram-Hansen, Bradley, Nardelli, Nantas, Gambardella, Andrew, Zinkov, Rob, Dokania, Puneet, Siddharth, N., Espinosa-Gonzalez, Ana Belen, Darzi, Ara, Torr, Philip, Baydin, Atılım Güneş

Simulation-Based Inference for Global Health Decisions

arXiv.org Machine LearningMay-14-2020

This is fomenting the development of comprehensive modelling The COVID-19 pandemic has highlighted the importance and simulation to support the design of health interventions of in-silico epidemiological modelling in predicting and policies, and to guide decision-making in a variety of the dynamics of infectious diseases to inform health system domains [22, 49]. For example, simulations health policy and decision makers about suitable prevention have provided valuable insight to deal with public health and containment strategies. Work in this setting problems such as tobacco consumption in New Zealand [50], involves solving challenging inference and control and diabetes and obesity in the US [58]. They have been problems in individual-based models of ever increasing used to explore policy options such as those in maternal and complexity. Here we discuss recent breakthroughs antenatal care in Uganda [44], and applied to evaluate health in machine learning, specifically in simulation-based reform scenarios such as predicting changes in access to inference, and explore its potential as a novel venue primary care services in Portugal [21]. Their applicability for model calibration to support the design and evaluation in informing the design of cancer screening programmes of public health interventions. To further stimulate has been also discussed [42, 23]. Recently, simulations have research, we are developing software interfaces that informed the response to the COVID-19 outbreak [19].

artificial intelligence, arxiv, machine learning, (12 more...)

arXiv.org Machine Learning

2005.07062

Country:

Africa > Uganda (0.25)
Oceania > New Zealand (0.25)
Europe > Portugal (0.25)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)

Lazaridou, Angeliki, Potapenko, Anna, Tieleman, Olivier

Multi-agent Communication meets Natural Language: Synergies between Functional and Structural Language Learning

arXiv.org Artificial IntelligenceMay-14-2020

In this work, we aim at making agents communicate On the other hand, multi-agent communication with humans in natural language. Our starting research (Foerster et al., 2016; Lazaridou et al., point is a language model that has been trained on 2017; Havrylov and Titov, 2017; Evtimova et al., generic, not task-specific language data. We then 2017; Lee et al., 2019) puts communication at the place this model in a multi-agent communication heart of agents' (language) learning. Implemented environment that generates task-specific rewards, within a multi-agent reinforcement learning setup, which are used to adapt or modulate the model, agents start tabula rasa and form communication making it task-conditional. We thus propose to decompose protocols that maximize task rewards. While this the problem of learning language use into purely utilitarian framework results in agents that two components: learning "what" to say based on successfully learn to solve the task by creating a a given situation, and learning "how" to say it. The communication protocol, these emergent communication "what" is the essence of communication that underlies protocols do not bear core properties of our intentions and is chosen by maximizing any natural language. Chaabouni et al. (2019) show that given utility, making it a functional, utility-driven protocols found through emergent communication, process. On the other hand, the "how" is a surface unlike natural language, do not conform to Zipf's realization of our intentions, i.e., the words we use Law of Abbreviation; Kottur et al. (2017) find that

artificial intelligence, machine learning, natural language, (19 more...)

2005.07064

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Poland (0.04)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceMay-14-2020

Continuous Multiagent Control using Collective Behavior Entropy for Large-Scale Home Energy Management

Sun, Jianwen, Zheng, Yan, Hao, Jianye, Meng, Zhaopeng, Liu, Yang

With the increasing popularity of electric vehicles, distributed energy generation and storage facilities in smart grid systems, an efficient Demand-Side Management (DSM) is urgent for energy savings and peak loads reduction. Traditional DSM works focusing on optimizing the energy activities for a single household can not scale up to large-scale home energy management problems. Multi-agent Deep Reinforcement Learning (MA-DRL) shows a potential way to solve the problem of scalability, where modern homes interact together to reduce energy consumers consumption while striking a balance between energy cost and peak loads reduction. However, it is difficult to solve such an environment with the non-stationarity, and existing MA-DRL approaches cannot effectively give incentives for expected group behavior. In this paper, we propose a collective MA-DRL algorithm with continuous action space to provide fine-grained control on a large scale microgrid. To mitigate the non-stationarity of the microgrid environment, a novel predictive model is proposed to measure the collective market behavior. Besides, a collective behavior entropy is introduced to reduce the high peak loads incurred by the collective behaviors of all householders in the smart grid. Empirical results show that our approach significantly outperforms the state-of-the-art methods regarding power cost reduction and daily peak loads optimization.

household, machine learning, reinforcement learning, (16 more...)

2005.1

Country:

Oceania > Australia > Queensland (0.04)
Asia > Singapore (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMay-14-2020

Competing in a Complex Hidden Role Game with Information Set Monte Carlo Tree Search

Reinhardt, Jack

Advances in intelligent game playing agents have led to successes in perfect information games like Go and imperfect information games like Poker. The Information Set Monte Carlo Tree Search (ISMCTS) family of algorithms outperforms previous algorithms using Monte Carlo methods in imperfect information games. In this paper, Single Observer Information Set Monte Carlo Tree Search (SO-ISMCTS) is applied to Secret Hitler, a popular social deduction board game that combines traditional hidden role mechanics with the randomness of a card deck. This combination leads to a more complex information model than the hidden role and card deck mechanics alone. It is shown in 10108 simulated games that SO-ISMCTS plays as well as simpler rule based agents, and demonstrates the potential of ISMCTS algorithms in complicated information set domains.

agent, artificial intelligence, planning & scheduling, (16 more...)

2005.07156

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > New York > New York County > New York City (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(4 more...)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

McGlaughlin, Peter (University of Illinois Urbana-Champaign) | Garg, Jugal (University of Illinois Urbana-Champaign)

Improving Nash Social Welfare Approximations

Journal of Artificial Intelligence ResearchMay-14-2020

We consider the problem of fairly allocating a set of indivisible goods among n agents. Various fairness notions have been proposed within the rapidly growing field of fair division, but the Nash social welfare (NSW) serves as a focal point. In part, this follows from the ‘unreasonable’ fairness guarantees provided, in the sense that a max NSW allocation meets multiple other fairness metrics simultaneously, all while satisfying a standard economic concept of efficiency, Pareto optimality. However, existing approximation algorithms fail to satisfy all of the remarkable fairness guarantees offered by a max NSW allocation, instead targeting only the specific NSW objective. We address this issue by presenting a 2 max NSW, Prop-1, 1/(2n) MMS, and Pareto optimal allocation in strongly polynomial time. Our techniques are based on a market interpretation of a fractional max NSW allocation. We present novel definitions of fairness concepts in terms of market prices, and design a new scheme to round a market equilibrium into an integral allocation in a way that provides most of the fairness properties of an integral max NSW allocation.

allocation, artificial intelligence, mathematics of computing, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11618

AI Access Foundation

11618

Journal of Artificial Intelligence Research

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Mathematics of Computing (0.87)

Iovino, Matteo, Scukins, Edvards, Styrud, Jonathan, Ögren, Petter, Smith, Christian

A Survey of Behavior Trees in Robotics and AI

arXiv.org Artificial IntelligenceMay-13-2020

Behavior Trees (BTs) were invented as a tool to enable modular AI in computer games, but have received an increasing amount of attention in the robotics community in the last decade. With rising demands on agent AI complexity, game programmers found that the Finite State Machines (FSM) that they used scaled poorly and were difficult to extend, adapt and reuse. In BTs, the state transition logic is not dispersed across the individual states, but organized in a hierarchical tree structure, with the states as leaves. This has a significant effect on modularity, which in turn simplifies both synthesis and analysis by humans and algorithms alike. These advantages are needed not only in game AI design, but also in robotics, as is evident from the research being done. In this paper we present a comprehensive survey of the topic of BTs in Artificial Intelligence and Robotic applications. The existing literature is described and categorized based on methods, application areas and contributions, and the paper is concluded with a list of open research challenges.

behavior tree, evolutionary algorithm, machine learning, (16 more...)

2005.05842

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
(15 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.68)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (1.00)
Government > Military (1.00)
Transportation (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(3 more...)