AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Machine Learning Practical: 6 Real-World Applications

#artificialintelligenceOct-3-2020, 13:30:15 GMT

Free Coupon Discount - Machine Learning Practical: 6 Real-World Applications, Machine Learning - Get Your Hands Dirty by Solving Real Industry Challenges with Python Created by Kirill Eremenko, Hadelin de Ponteves, Dr. Ryan Ahmed, Ph.D., MBA, SuperDataScience Team, Rony Sulca Students also bought Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Deep Learning: GANs and Variational Autoencoders Artificial Intelligence: Reinforcement Learning in Python Natural Language Processing with Deep Learning in Python Advanced AI: Deep Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Preview this Udemy Course GET COUPON CODE Description So you know the theory of Machine Learning and know how to create your first algorithms. There are tons of courses out there about the underlying theory of Machine Learning which don't go any deeper – into the applications. This course is not one of them. Are you ready to apply all of the theory and knowledge to real life Machine Learning challenges? We gathered best industry professionals with tons of completed projects behind.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Reinforcement learning and reasoning

#artificialintelligenceOct-3-2020, 10:10:15 GMT

Reinforcement learning has seen a lot of progress in recent years. From DeepMind success with teaching machines how to play Atari games, then AlphaGo beating world champions in Go to recent OpenAI's progress on Dota 2, a multiplayer game where players divided into two teams compete with each other. The common thread is an artificial agent operating in a virtual world, where the prize is clear (e.g. On the other hand people are experimenting with AI agents operating in real-world. Each clip of Boston Dynamics gets a lot of press, showing robots performing amazing stunts, as you can see yourself here or here.

large language model, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games > Computer Games (0.55)
Leisure & Entertainment > Games > Go (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)

Add feedback

Disentangling causal effects for hierarchical reinforcement learning

Corcoll, Oriol, Vicente, Raul

arXiv.org Artificial IntelligenceOct-3-2020

Exploration and credit assignment under sparse rewards are still challenging problems. We argue that these challenges arise in part due to the intrinsic rigidity of operating at the level of actions. Actions can precisely define how to perform an activity but are ill-suited to describe what activity to perform. Instead, causal effects are inherently composable and temporally abstract, making them ideal for descriptive tasks. By leveraging a hierarchy of causal effects, this study aims to expedite the learning of task-specific behavior and aid exploration. Borrowing counterfactual and normality measures from causal literature, we disentangle controllable effects from effects caused by other dynamics of the environment. We propose CEHRL, a hierarchical method that models the distribution of controllable effects using a Variational Autoencoder. This distribution is used by a high-level policy to 1) explore the environment via random effect exploration so that novel effects are continuously discovered and learned, and to 2) learn task-specific behavior by prioritizing the effects that maximize a given reward function. In comparison to exploring with random actions, experimental results show that random effect exploration is a more efficient mechanism and that by assigning credit to few effects rather than many actions, CEHRL learns tasks more rapidly.

artificial intelligence, controllable effect, neural network, (16 more...)

arXiv.org Artificial Intelligence

2010.01351

Country:

Europe (0.14)
North America > United States (0.14)
North America > Puerto Rico (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Policy Gradient with Expected Quadratic Utility Maximization: A New Mean-Variance Approach in Reinforcement Learning

Kato, Masahiro, Nakagawa, Kei

arXiv.org Machine LearningOct-3-2020

Reinforcement learning (RL) and planning in Markov decision processes (MDPs) is one type of dynamic decisionmaking problem (Puterman, 1994; Bertsekas & Tsitsiklis, 1996; sut, 1998). While the typical objective is to maximize the expected cumulative reward, risk-aware decision-making has attracted attention in real-world applications, such as finance, robotics, and playing games (Geibel & Wysotzki, 2005; García & Fernández, 2015). The notion of risk in RL is related to the fact that even an optimal policy may perform poorly in some cases owing to the stochastic nature of the problem. To capture the risk, various criteria have been proposed, such as Value at Risk (Luenberger, 1998; Chow & Ghavamzadeh, 2014; Chow et al., 2017) and variance (Markowitz, 1952; Markowitz et al., 2000; Tamar et al., 2012; L.A. & Ghavamzadeh, 2013). Among them, we focus on the mean-variance tradeoff in RL problems. Typical mean-variance RL (MVRL) methods attempt to maximize the expected cumulative reward while maintaining the variance threshold (Tamar et al., 2012; L.A. & Ghavamzadeh, 2013; Prashanth & Ghavamzadeh, 2016; Xie et al., 2018; Bisi et al., 2020; Zhang et al., 2020). However, most existing MVRL methods suffer from high computational costs owing to the double sampling issue when approximating the gradient of the variance term (Tamar et al., 2012; L.A. & Ghavamzadeh, 2013; Prashanth & Ghavamzadeh, 2016). To avoid the double sampling issue, Xie et al. (2018) proposed a method based on the Legendre-Fenchel duality (Boyd & Vandenberghe, 2004). Although the method does not suffer from the double sampling issue, we cannot apply a standard policy gradient method and must use a coordinate descent algorithm.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

2010.01404

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Schwarzer, Max, Anand, Ankesh, Goel, Rishab, Hjelm, R Devon, Courville, Aaron, Bachman, Philip

arXiv.org Machine LearningOct-3-2020

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https: //github.com/mila-iqia/spr.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2007.05929

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Episodic Memory for Learning Subjective-Timescale Models

Zakharov, Alexey, Crosby, Matthew, Fountas, Zafeirios

arXiv.org Artificial IntelligenceOct-3-2020

In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment even though planning often requires reasoning over multi-step timescales, with intermediate states either unnecessary, or worse, accumulating prediction error. In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context. Inspired by the recent works on human time perception, we devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale - over which it learns world dynamics and over which future planning is performed. We implement this in the framework of active inference and demonstrate that the resulting subjective-timescale model (STM) can systematically vary the temporal extent of its predictions while preserving the same computational efficiency. Additionally, we show that STM predictions are more likely to introduce future salient events (for example new objects coming into view), incentivising exploration of new areas of the environment. As a result, STM produces more informative action-conditioned roll-outs that assist the agent in making better decisions. We validate significant improvement in our STM agent's performance in the Animal-AI environment against a baseline system, trained using the environment's objective-timescale dynamics. An agent endowed with a model of its environment has the ability to predict the consequences of its actions and perform planning into the future before deciding on its next move. Models can allow agents to simulate the possible action-conditioned futures from their current state, even if the state was never visited during learning. As a result, model-based approaches can provide agents with better generalization abilities across both states and tasks in an environment, compared to their model-free counterparts (Racanière et al., 2017; Mishra et al., 2017).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2010.0143

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Consumer Health (0.75)
Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

Karkus, Peter, Mirza, Mehdi, Guez, Arthur, Jaegle, Andrew, Lillicrap, Timothy, Buesing, Lars, Heess, Nicolas, Weber, Theophane

arXiv.org Artificial IntelligenceOct-3-2020

Intelligent robots need to achieve abstract objectives using concrete, spatiotemporally complex sensory information and motor control. Tabula rasa deep reinforcement learning (RL) has tackled demanding tasks in terms of either visual, abstract, or physical reasoning, but solving these jointly remains a formidable challenge. One recent, unsolved benchmark task that integrates these challenges is Mujoban, where a robot needs to arrange 3D warehouses generated from 2D Sokoban puzzles. We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures. Unlike classic architectures that are typically model-based, we use only model-free modules trained with RL or supervised learning. We find that our modular RL approach dramatically outperforms the state-of-the-art monolithic RL agent on Mujoban. Further, learned modules can be reused when, e.g., using a different robot platform to solve the same task. Together our results give strong evidence for the importance of research into modular RL designs. Project website: https://sites.google.com/view/modular-rl/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2010.01298

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
Asia > Singapore (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A VR film/game with AI characters can be different every time you watch/play

#artificialintelligenceOct-2-2020, 15:11:34 GMT

Gagliano previously won the first ever Emmy for a VR experience in 2015. Now he and producer David Oppenheim, who works at the National Film Board of Canada, are experimenting with a kind of storytelling they call dynamic film. "We see Agence as a sort of silent-era dynamic film," says Oppenheim. Agence was debuted at the Venice International Film Festival last month and was released this week to watch/play via Steam, an online video game platform. The basic plot revolves around a group of creatures and their appetite for a mysterious plant that appears on their planet: can they control their desire or will they destabilize the planet and get tipped to their doom?

machine learning, oppenheim, reinforcement learning, (8 more...)

#artificialintelligence

Country: North America > Canada (0.26)

Industry:

Media > Film (0.75)
Leisure & Entertainment > Games > Computer Games (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Meta-Model-Based Meta-Policy Optimization

Hiraoka, Takuya, Imagawa, Takahisa, Tangkaratt, Voot, Osa, Takayuki, Onishi, Takashi, Tsuruoka, Yoshimasa

arXiv.org Machine LearningOct-2-2020

Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and has demonstrated its high sample efficiency. However, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model of an environment. Thus, its performance in a real environment tends to degrade when the predictive model is inaccurate. In this paper, we prove that performance degradation can be suppressed by using branched meta-rollouts. On the basis of this theoretical analysis, we propose Meta-Model-based Meta-Policy Optimization (M3PO), in which the branched meta-rollouts are used for policy optimization. We demonstrate that M3PO outperforms existing meta reinforcement learning methods in continuous-control benchmarks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2006.02608

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Interactive Reinforcement Learning for Feature Selection with Decision Tree in the Loop

Fan, Wei, Liu, Kunpeng, Liu, Hao, Ge, Yong, Xiong, Hui, Fu, Yanjie

arXiv.org Machine LearningOct-2-2020

We study the problem of balancing effectiveness and efficiency in automated feature selection. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection is mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection automatically navigates to the best subset, but is usually inefficient. Can we bridge the gap between effectiveness and efficiency under automation? Motivated by this dilemma, we aim to develop a novel feature space navigation method. In our preliminary work, we leveraged interactive reinforcement learning to accelerate feature selection by external trainer-agent interaction. In this journal version, we propose a novel interactive and closed-loop architecture to simultaneously model interactive reinforcement learning (IRL) and decision tree feedback (DTF). Specifically, IRL is to create an interactive feature selection loop and DTF is to feed structured feature knowledge back to the loop. First, the tree-structured feature hierarchy from decision tree is leveraged to improve state representation. In particular, we represent the selected feature subset as an undirected graph of feature-feature correlations and a directed tree of decision features. We propose a new embedding method capable of empowering graph convolutional network to jointly learn state representation from both the graph and the tree. Second, the tree-structured feature hierarchy is exploited to develop a new reward scheme. In particular, we personalize reward assignment of agents based on decision tree feature importance. In addition, observing agents' actions can be feedback, we devise another reward scheme, to weigh and assign reward based on the feature selected frequency ratio in historical action records. Finally, we present extensive experiments on real-world datasets to show the improved performance.

machine learning, reinforcement learning, selection, (15 more...)

arXiv.org Machine Learning

2010.02506

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Asia (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.68)
Energy > Renewable > Geothermal (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback