AITopics

Recent question generation (QG) approaches often utilize the sequence-to-sequence framework (Seq2Seq) to optimize the log likelihood of ground-truth questions using teacher forcing. However, this training objective is inconsistent with actual question quality, which is often reflected by certain global properties such as whether the question can be answered by the document. As such, we directly optimize for QG-specific objectives via reinforcement learning to improve question quality. We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions. We conduct both automatic and human evaluations in addition to thorough analysis to explore the effect of each QG-specific reward. We find that optimizing on question-specific rewards generally leads to better performance in automatic evaluation metrics. However, only the rewards that correlate well with human judgement (e.g., relevance) lead to real improvement in question quality. Optimizing for the others, especially answerability, introduces incorrect bias to the model, resulting in poor question quality.

machine learning, question answering, reinforcement learning, (19 more...)

2011.01102

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Montana > Sanders County (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Causal Campbell-Goodhart's law and Reinforcement Learning

Ashton, Hal

Campbell-Goodhart's law relates to the causal inference error whereby decision-making agents aim to influence variables which are correlated to their goal objective but do not reliably cause it. This is a well known error in Economics and Political Science but not widely labelled in Artificial Intelligence research. Through a simple example, we show how off-the-shelf deep Reinforcement Learning (RL) algorithms are not necessarily immune to this cognitive error. The off-policy learning method is tricked, whilst the on-policy method is not. The practical implication is that naive application of RL to complex real life problems can result in the same types of policy errors that humans make. Great care should be taken around understanding the causal model that underpins a solution derived from Reinforcement Learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2011.0101

Country:

Europe > United Kingdom > Scotland (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Yin, Haiyan, Li, Yingzhen, Pan, Sinno Jialin, Zhang, Cheng, Tschiatschek, Sebastian

Reinforcement Learning with Efficient Active Feature Acquisition

Solving real-life sequential decision making problems under partial observability involves an exploration-exploitation problem. To be successful, an agent needs to efficiently gather valuable information about the state of the world for making rewarding decisions. However, in real-life, acquiring valuable information is often highly costly, e.g., in the medical domain, information acquisition might correspond to performing a medical test on a patient. This poses a significant challenge for the agent to perform optimally for the task while reducing the cost for information acquisition. In this paper, we propose a model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states, which are then used by the policy to maximize the task reward in a cost efficient manner. We demonstrate the efficacy of our proposed framework in a control domain as well as using a medical simulator. In both tasks, our proposed method outperforms conventional baselines and results in policies with greater cost efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2011.00825

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(21 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Cooperative Heterogeneous Deep Reinforcement Learning

Zheng, Han, Wei, Pengfei, Jiang, Jing, Long, Guodong, Lu, Qinghua, Zhang, Chengqi

Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2011.00791

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kim, Joanne Taery, Ha, Sehoon

Observation Space Matters: Benchmark and Optimization Algorithm

Recent advances in deep reinforcement learning (deep RL) enable researchers to solve challenging control problems, from simulated environments to real-world robotic tasks. However, deep RL algorithms are known to be sensitive to the problem formulation, including observation spaces, action spaces, and reward functions. There exist numerous choices for observation spaces but they are often designed solely based on prior knowledge due to the lack of established principles. In this work, we conduct benchmark experiments to verify common design choices for observation spaces, such as Cartesian transformation, binary contact flags, a short history, or global positions. Then we propose a search algorithm to find the optimal observation spaces, which examines various candidate observation spaces and removes unnecessary observation channels with a Dropout-Permutation test. We demonstrate that our algorithm significantly improves learning speed compared to manually designed observation spaces. We also analyze the proposed algorithm by evaluating different hyperparameters.

machine learning, observation space, reinforcement learning, (16 more...)

2011.00756

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
(2 more...)

Interpreting Graph Drawing with Multi-Agent Reinforcement Learning

Safarli, Ilkin, Zhou, Youjia, Wang, Bei

Applying machine learning techniques to graph drawing has become an emergent area of research in visualization. In this paper, we interpret graph drawing as a multi-agent reinforcement learning (MARL) problem. We first demonstrate that a large number of classic graph drawing algorithms, including force-directed layouts and stress majorization, can be interpreted within the framework of MARL. Using this interpretation, a node in the graph is assigned to an agent with a reward function. Via multi-agent reward maximization, we obtain an aesthetically pleasing graph layout that is comparable to the outputs of classic algorithms. The main strength of a MARL framework for graph drawing is that it not only unifies a number of classic drawing algorithms in a general formulation but also supports the creation of novel graph drawing algorithms by introducing a diverse set of reward functions.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2011.00748

Country:

North America > United States > Utah (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > Germany (0.04)

Genre:

Overview (0.93)
Research Report (0.64)

Industry:

Information Technology (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-1-2020

Learning Deep Features in Instrumental Variable Regression

Xu, Liyuan, Chen, Yutian, Srinivasan, Siddarth, de Freitas, Nando, Doucet, Arnaud, Gretton, Arthur

Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by utilizing an instrumental variable, which affects the outcome only through the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

machine learning, regression, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2010.07154

Country: Asia > Vietnam (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

arXiv.org Artificial IntelligenceNov-1-2020

Fast Reinforcement Learning with Incremental Gaussian Mixture Models

Pinto, Rafael

This work presents a novel algorithm that integrates a data-efficient function approximator with reinforcement learning in continuous state spaces. An online and incremental algorithm capable of learning from a single pass through data, called Incremental Gaussian Mixture Network (IGMN), was employed as a sample-efficient function approximator for the joint state and Q-values space, all in a single model, resulting in a concise and data-efficient algorithm, i.e., a reinforcement learning algorithm that learns from very few interactions with the environment. Results are analyzed to explain the properties of the obtained algorithm, and it is observed that the use of the IGMN function approximator brings some important advantages to reinforcement learning in relation to conventional neural networks trained by gradient descent methods.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2011.00702

Country:

South America > Brazil > Rio Grande do Sul (0.04)
South America > Brazil > Rio Grande do Norte > Natal (0.04)
South America > Brazil > Ceará > Fortaleza (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-1-2020

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

Yang, Yaodong, Wang, Jun

Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. MARL is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

deep learning, machine learning, reinforcement learning, (17 more...)

2011.00583

Country:

North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine (0.92)
Energy (0.92)
Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Dorbala, Vishnu Sashank, Srinivasan, Arjun, Bera, Aniket

Can a Robot Trust You? A DRL-Based Approach to Trust-Driven Human-Guided Navigation

arXiv.org Artificial IntelligenceNov-1-2020

Humans are known to construct cognitive maps of their everyday surroundings using a variety of perceptual inputs. As such, when a human is asked for directions to a particular location, their wayfinding capability in converting this cognitive map into directional instructions is challenged. Owing to spatial anxiety, the language used in the spoken instructions can be vague and often unclear. To account for this unreliability in navigational guidance, we propose a novel Deep Reinforcement Learning (DRL) based trust-driven robot navigation algorithm that learns humans' trustworthiness to perform a language guided navigation task. Our approach seeks to answer the question as to whether a robot can trust a human's navigational guidance or not. To this end, we look at training a policy that learns to navigate towards a goal location using only trustworthy human guidance, driven by its own robot trust metric. We look at quantifying various affective features from language-based instructions and incorporate them into our policy's observation space in the form of a human trust metric. We utilize both these trust metrics into an optimal cognitive reasoning scheme that decides when and when not to trust the given guidance. Our results show that the learned policy can navigate the environment in an optimal, time-efficient manner as opposed to an explorative approach that performs the same task. We showcase the efficacy of our results both in simulation and a real-world environment.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2011.00554

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)