AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Artificial Intelligence for Business

#artificialintelligenceAug-30-2020, 13:06:56 GMT

Udemy Coupon - Solve Real World Business Problems with AI Solutions Created by Hadelin de Ponteves, Kirill Eremenko, SuperDataScience Team English [Auto-generated], French [Auto-generated], 5 more Students also bought Artificial Intelligence: Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Recommender Systems and Deep Learning in Python Cluster Analysis and Unsupervised Machine Learning in Python Natural Language Processing with Deep Learning in Python Preview this Course GET COUPON CODE Description Structure of the course: Part 1 - Optimizing Business Processes Case Study: Optimizing the Flows in an E-Commerce Warehouse AI Solution: Q-Learning Part 2 - Minimizing Costs Case Study: Minimizing the Costs in Energy Consumption of a Data Center AI Solution: Deep Q-Learning Part 3 - Maximizing Revenues Case Study: Maximizing Revenue of an Online Retail Business AI Solution: Thompson Sampling Real World Business Applications: With Artificial Intelligence, you can do three main things for any business: Optimize Business Processes Minimize Costs Maximize Revenues We will show you exactly how to succeed these applications, through Real World Business case studies. And for each of these applications we will build a separate AI to solve the challenge. In Part 1 - Optimizing Processes, we will build an AI that will optimize the flows in an E-Commerce warehouse. In Part 2 - Minimizing Costs, we will build a more advanced AI that will minimize the costs in energy consumption of a data center by more than 50%! Just as Google did last year thanks to DeepMind.

machine learning, natural language, reinforcement learning, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.75)

Industry:

Education > Educational Setting > Online (1.00)
Information Technology > Services (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

Goecks, Vinicius G.

arXiv.org Artificial IntelligenceAug-30-2020

Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.

air transportation, reinforcement learning controller performance, upstream oil & gas, (29 more...)

arXiv.org Artificial Intelligence

2008.13221

Country:

North America > United States > California (0.45)
North America > United States > New York (0.27)
North America > United States > Pennsylvania (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Transportation > Air (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Learning and Reasoning for Robot Dialog and Navigation Tasks

Lu, Keting, Zhang, Shiqi, Stone, Peter, Chen, Xiaoping

arXiv.org Artificial IntelligenceAug-30-2020

Reinforcement learning and probabilistic reasoning algorithms aim at learning from interaction experiences and reasoning with probabilistic contextual knowledge respectively. In this research, we develop algorithms for robot task completions, while looking into the complementary strengths of reinforcement learning and probabilistic reasoning techniques. The robots learn from trial-and-error experiences to augment their declarative knowledge base, and the augmented knowledge can be used for speeding up the learning process in potentially different tasks. We have implemented and evaluated the developed algorithms using mobile robots conducting dialog and navigation tasks. From the results, we see that our robot's performance can be improved by both reasoning with human knowledge and learning from task-completion experience. More interestingly, the robot was able to learn from navigation tasks to improve its dialog strategies.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2005.09833

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Automobiles & Trucks > Manufacturer (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

A Minimal Working Example for Continuous Policy Gradients

#artificialintelligenceAug-29-2020, 09:15:43 GMT

Defining a custom loss function and applying the GradientTape functionality, the actor network can be trained using only a few lines. At the root of all the sophisticated actor-critic algorithms that are designed and applied these days is the vanilla policy gradient algorithm, which essentially is an actor-only algorithm. Nowadays, the actor that learns the decision-making policy is often represented by a neural network. With so many deep reinforcement learning algorithms in circulation, you'd expect it to be easy to find abundant plug-and-play TensorFlow implementations for a basic actor network in continuous control, but this is hardly the case. Various reasons may exist for this.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Predicting Game Difficulty and Churn Without Players

Roohi, Shaghayegh, Relas, Asko, Takatalo, Jari, Heiskanen, Henri, Hämäläinen, Perttu

arXiv.org Artificial IntelligenceAug-29-2020

We propose a novel simulation model that is able to predict the per-level churn and pass rates of Angry Birds Dream Blast, a popular mobile free-to-play game. Our primary contribution is to combine AI gameplay using Deep Reinforcement Learning (DRL) with a simulation of how the player population evolves over the levels. The AI players predict level difficulty, which is used to drive a player population model with simulated skill, persistence, and boredom. This allows us to model, e.g., how less persistent and skilled players are more sensitive to high difficulty, and how such players churn early, which makes the player population and the relation between difficulty and churn evolve level by level. Our work demonstrates that player behavior predictions produced by DRL gameplay can be significantly improved by even a very simple population-level simulation of individual player differences, without requiring costly retraining of agents or collecting new DRL gameplay data for each simulated player.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3410404.3414235

2008.12937

Country:

North America > Canada (0.05)
Europe > Finland (0.05)
Africa (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

He, Xu, An, Bo, Li, Yanghua, Chen, Haikai, Wang, Rundong, Wang, Xinrun, Yu, Runsheng, Li, Xin, Wang, Zhirong

arXiv.org Artificial IntelligenceAug-29-2020

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents' exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2008.09369

Country:

South America > Brazil (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning with Feedback-modulated TD-STDP

Chung, Stephen, Kozma, Robert

arXiv.org Artificial IntelligenceAug-29-2020

Spiking neuron networks have been used successfully to solve simple reinforcement learning tasks with continuous action set applying learning rules based on spike-timing-dependent plasticity (STDP). However, most of these models cannot be applied to reinforcement learning tasks with discrete action set since they assume that the selected action is a deterministic function of firing rate of neurons, which is continuous. In this paper, we propose a new STDP-based learning rule for spiking neuron networks which contains feedback modulation. We show that the STDP-based learning rule can be used to solve reinforcement learning tasks with discrete action set at a speed similar to standard reinforcement learning algorithms when applied to the CartPole and LunarLander tasks. Moreover, we demonstrate that the agent is unable to solve these tasks if feedback modulation is omitted from the learning rule. We conclude that feedback modulation allows better credit assignment when only the units contributing to the executed action and TD error participate in learning.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2008.13044

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the model-based stochastic value gradient for continuous reinforcement learning

Amos, Brandon, Stanton, Samuel, Yarats, Denis, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceAug-28-2020

Model-based reinforcement learning approaches add explicit domain knowledge to agents in hopes of improving the sample-efficiency in comparison to model-free agents. However, in practice model-based methods are unable to achieve the same asymptotic performance on challenging continuous control tasks due to the complexity of learning and controlling an explicit world model. In this paper we investigate the stochastic value gradient (SVG), which is a well-known family of methods for controlling continuous systems which includes model-based approaches that distill a model-based value expansion into a model-free policy. We consider a variant of the model-based SVG that scales to larger systems and uses 1) an entropy regularization to help with exploration, 2) a learned deterministic world model to improve the short-horizon value estimate, and 3) a learned model-free value estimate after the model's rollout. This SVG variation captures the model-free soft actor-critic method as an instance when the model rollout horizon is zero, and otherwise uses short-horizon model rollouts to improve the value estimate for the policy update. We surpass the asymptotic performance of other model-based methods on the proprioceptive MuJoCo locomotion tasks from the OpenAI gym, including a humanoid. We notably achieve these results with a simple deterministic world model without requiring an ensemble.

arxiv preprint arxiv, survey article, upstream oil & gas, (14 more...)

arXiv.org Artificial Intelligence

2008.12775

Country: Europe (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Real-world Video Adaptation with Reinforcement Learning

Mao, Hongzi, Chen, Shannon, Dimmery, Drew, Singh, Shaun, Blaisdell, Drew, Tian, Yuandong, Alizadeh, Mohammad, Bakshy, Eytan

arXiv.org Artificial IntelligenceAug-28-2020

Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE). We evaluate recently proposed RL-based ABR methods in Facebook's web-based video streaming platform. Real-world ABR contains several challenges that requires customized designs beyond off-the-shelf RL algorithms -- we implement a scalable neural network architecture that supports videos with arbitrary bitrate encodings; we design a training method to cope with the variance resulting from the stochasticity in network conditions; and we leverage constrained Bayesian optimization for reward shaping in order to optimize the conflicting QoE objectives. In a week-long worldwide deployment with more than 30 million video streaming sessions, our RL approach outperforms the existing human-engineered ABR algorithms.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2008.12858

Country:

South America (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Central America (0.04)

Genre:

Research Report (1.00)
Instructional Material (0.93)

Industry: Information Technology > Services (0.49)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

AllenAct: A Framework for Embodied AI Research

Weihs, Luca, Salvador, Jordi, Kotar, Klemen, Jain, Unnat, Zeng, Kuo-Hao, Mottaghi, Roozbeh, Kembhavi, Aniruddha

arXiv.org Artificial IntelligenceAug-28-2020

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities. This growth has been facilitated by the creation of a large number of simulated environments (such as AI2-THOR, Habitat and CARLA), tasks (like point navigation, instruction following, and embodied question answering), and associated leaderboards. While this diversity has been beneficial and organic, it has also fragmented the community: a huge amount of effort is required to do something as simple as taking a model trained in one environment and testing it in another. This discourages good science. We introduce AllenAct, a modular and flexible learning framework designed with a focus on the unique requirements of Embodied AI research. AllenAct provides first-class support for a growing collection of embodied environments, tasks and algorithms, provides reproductions of state-of-the-art models and includes extensive documentation, tutorials, start-up code, and pre-trained models. We hope that our framework makes Embodied AI more accessible and encourages new researchers to join this exciting area. The framework can be accessed at: https://allenact.org/

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2008.1276

Country: Europe > Sweden > Skåne County > Malmö (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback