AITopics

2009.14471

Country:

North America > United States > Maryland > Prince George's County > College Park (0.05)
North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Vieillard, Nino, Pietquin, Olivier, Geist, Matthieu

Munchausen Reinforcement Learning

arXiv.org Machine LearningNov-4-2020

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2007.1443

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.45)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kamat, Anand, Precup, Doina

Diversity-Enriched Option-Critic

arXiv.org Artificial IntelligenceNov-4-2020

Temporal abstraction allows reinforcement learning agents to represent knowledge and develop strategies over different temporal scales. The option-critic framework has been demonstrated to learn temporally extended actions, represented as options, end-to-end in a model-free setting. However, feasibility of option-critic remains limited due to two major challenges, multiple options adopting very similar behavior, or a shrinking set of task relevant options. These occurrences not only void the need for temporal abstraction, they also affect performance. In this paper, we tackle these problems by learning a diverse set of options. We introduce an information-theoretic intrinsic reward, which augments the task reward, as well as a novel termination objective, in order to encourage behavioral diversity in the option set. We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin. Furthermore, we show that our approach sustainably generates robust, reusable, reliable and interpretable options, in contrast to option-critic.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2011.02565

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

arXiv.org Artificial IntelligenceNov-4-2020

Generative Inverse Deep Reinforcement Learning for Online Recommendation

Chen, Xiaocong, Yao, Lina, Sun, Aixin, Wang, Xianzhi, Xu, Xiwei, Zhu, Liming

Deep reinforcement learning enables an agent to capture user's interest through interactions with the environment dynamically. It has attracted great interest in the recommendation research. Deep reinforcement learning uses a reward function to learn user's interest and to control the learning process. However, most reward functions are manually designed; they are either unrealistic or imprecise to reflect the high variety, dimensionality, and non-linearity properties of the recommendation problem. That makes it difficult for the agent to learn an optimal policy to generate the most satisfactory recommendations. To address the above issue, we propose a novel generative inverse reinforcement learning approach, namely InvRec, which extracts the reward function from user's behaviors automatically, for online recommendation. We conduct experiments on an online platform, VirtualTB, and compare with several state-of-the-art methods to demonstrate the feasibility and effectiveness of our proposed approach.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2011.02248

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceNov-3-2020, 14:10:27 GMT

Animal Cognition Induces Common Sense in Artificial Intelligence Agents

Reinforcement learning models are trained, using a similar concept by animal researchers to train animals. For a very long period, artificial intelligence agents were trained on machine learning models to perform tasks that are usually done by humans. The neural networks of machine learning models are designed and trained in such a format that they perform the tasks without any human intervention or supervision. However, ever since its inception, the researchers and scientists are curious to induce cognitive abilities into artificial intelligence agents. For a decade, despite the experiments designed to train the artificial neural network by utilizing the human cognitive ability for adopting common sense, the researchers were unable to reach into a reasonable conclusion. The researchers were resorting to behavioral science and neuroscience earlier to induce common sense into the artificial intelligence agents.

deep learning, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Military (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.78)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

arXiv.org Artificial IntelligenceNov-3-2020

Amortized Variational Deep Q Network

Zhang, Haotian, Wang, Yuhao, Sun, Jianyong, Xu, Zongben

Efficient exploration is one of the most important issues in deep reinforcement learning. To address this issue, recent methods consider the value function parameters as random variables, and resort variational inference to approximate the posterior of the parameters. In this paper, we propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network. We establish the equivalence between the loss of the new model and the amortized variational inference loss. We realize the balance of exploration and exploitation by assuming the posterior as Cauchy and Gaussian, respectively in a two-stage training process. We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method. Experimental results on classical control tasks in OpenAI Gym and chain Markov Decision Process tasks show that the proposed method performs significantly better than state-of-art methods and requires much less training time.

avdqn, neural network, upstream oil & gas, (17 more...)

2011.01706

Country:

Asia > China (0.14)
North America > Canada (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Hihn, Heinke, Braun, Daniel A.

Specialization in Hierarchical Learning Systems

arXiv.org Machine LearningNov-3-2020

Joining multiple decision-makers together is a powerful way to obtain more sophisticated decision-making systems, but requires to address the questions of division of labor and specialization. We investigate in how far information constraints in hierarchies of experts not only provide a principled method for regularization but also to enforce specialization. In particular, we devise an information-theoretically motivated on-line learning rule that allows partitioning of the problem space into multiple sub-problems that can be solved by the individual experts. We demonstrate two different ways to apply our method: (i) partitioning problems based on individual data samples and (ii) based on sets of data samples representing tasks. Approach (i) equips the system with the ability to solve complex decision-making problems by finding an optimal combination of local expert decision-makers. Approach (ii) leads to decision-makers specialized in solving families of tasks, which equips the system with the ability to solve meta-learning problems. We show the broad applicability of our approach on a range of problems including classification, regression, density estimation, and reinforcement learning problems, both in the standard machine learning setup and in a meta-learning setting.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

doi: 10.1007/s11063-020-10351-3

2011.01845

Country:

Europe > Germany (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting > Online (0.54)
Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Kallus, Nathan, Uehara, Masatoshi

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

arXiv.org Machine LearningNov-3-2020

We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice, and improve policies' implementability in practice. Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper, we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under very lax conditions. These also enjoy a (partial) double robustness property.

efficiency, machine learning, reinforcement learning, (17 more...)

2006.03886

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

arXiv.org Artificial IntelligenceNov-3-2020

MBVI: Model-Based Value Initialization for Reinforcement Learning

Lyu, Xubo, Li, Site, Siriya, Seth, Pu, Ye, Chen, Mo

Model-free reinforcement learning (RL) is capable of learning control policies for high-dimensional, complex robotic tasks, but tends to be data inefficient. Model-based RL and optimal control have been proven to be much more data-efficient if an accurate model of the system and environment is known, but can be difficult to scale to expressive models for high-dimensional problems. In this paper, we propose a novel approach to alleviate data inefficiency of model-free RL by warm-starting the learning process using model-based solutions. We do so by initializing a high-dimensional value function via supervision from a low-dimensional value function obtained by applying model-based techniques on a low-dimensional problem featuring an approximate system model. Therefore, our approach exploits the model priors from a simplified problem space implicitly and avoids the direct use of high-dimensional, expressive models. We demonstrate our approach on two representative robotic learning tasks and observe significant improvements in performance and efficiency, and analyze our method empirically with a third task.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2011.02073

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-3-2020

Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time

Geng, Yuanzhe, Liu, Erwu, Wang, Rui, Liu, Yiming

Route planning is important in transportation. Existing works focus on finding the shortest path solution or using metrics such as safety and energy consumption to determine the planning. It is noted that most of these studies rely on prior knowledge of road network, which may be not available in certain situations. In this paper, we design a route planning algorithm based on deep reinforcement learning (DRL) for pedestrians. We use travel time consumption as the metric, and plan the route by predicting pedestrian flow in the road network. We put an agent, which is an intelligent robot, on a virtual map. Different from previous studies, our approach assumes that the agent does not need any prior information about road network, but simply relies on the interaction with the environment. We propose a dynamically adjustable route planning (DARP) algorithm, where the agent learns strategies through a dueling deep Q network to avoid congested roads. Simulation results show that the DARP algorithm saves 52% of the time under congestion condition when compared with traditional shortest path planning algorithms.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2011.01771

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Asia > China > Shanghai > Shanghai (0.04)
South America > Brazil > Federal District > Brasília (0.04)
(13 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)