AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Inverse Policy Evaluation for Value-based Sequential Decision-making

Chan, Alan, de Asis, Kris, Sutton, Richard S.

arXiv.org Artificial IntelligenceAug-25-2020

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of \textit{some} policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to \textit{any} policy. This is especially relevant in the function-approximation regime, when the true value function can't be perfectly represented. In this work, we explore the use of \textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function. We provide theoretical and empirical results to show that inverse policy evaluation, combined with an approximate value iteration algorithm, is a feasible method for value-based control.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2008.11329

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Auxiliary-task Based Deep Reinforcement Learning for Participant Selection Problem in Mobile Crowdsourcing

Shen, Wei, He, Xiaonan, Zhang, Chuheng, Ni, Qiang, Dou, Wanchun, Wang, Yan

arXiv.org Machine LearningAug-25-2020

In mobile crowdsourcing (MCS), the platform selects participants to complete location-aware tasks from the recruiters aiming to achieve multiple goals (e.g., profit maximization, energy efficiency, and fairness). However, different MCS systems have different goals and there are possibly conflicting goals even in one MCS system. Therefore, it is crucial to design a participant selection algorithm that applies to different MCS systems to achieve multiple goals. To deal with this issue, we formulate the participant selection problem as a reinforcement learning problem and propose to solve it with a novel method, which we call auxiliary-task based deep reinforcement learning (ADRL). We use transformers to extract representations from the context of the MCS system and a pointer network to deal with the combinatorial optimization problem. To improve the sample efficiency, we adopt an auxiliary-task training process that trains the network to predict the imminent tasks from the recruiters, which facilitates the embedding learning of the deep learning model. Additionally, we release a simulated environment on a specific MCS task, the ride-sharing task, and conduct extensive performance evaluations in this environment. The experimental results demonstrate that ADRL outperforms and improves sample efficiency over other well-recognized baselines in various settings.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3340531.3411913

2008.11087

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

t-Soft Update of Target Network for Deep Reinforcement Learning

Kobayashi, Taisuke, Ilboudo, Wendyam Eric Lionel

arXiv.org Machine LearningAug-25-2020

This paper proposes a new robust update rule of the target network for deep reinforcement learning, to replace the conventional update rule, given as an exponential moving average. The problem with the conventional rule is the fact that all the parameters are smoothly updated with the same speed, even when some of them are trying to update toward the wrong directions. To robustly update the parameters, the t-soft update, which is inspired by the student-t distribution, is derived with reference to the analogy between the exponential moving average and the normal distribution. In most of PyBullet robotics simulations, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2008.10861

Country: Asia > Japan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

Watanabe, Hirohisa, Tsukada, Mineto, Matsutani, Hiroki

arXiv.org Machine LearningAug-25-2020

DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for Xilinx PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation without data transfer overhead complete a CartPole-v0 task 29.76x and 125.88x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.

machine learning, reinforcement, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2005.04646

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeepMind's Three Pillars for Building Robust Machine Learning Systems - KDnuggets

#artificialintelligenceAug-24-2020, 16:02:48 GMT

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Building machine learning systems differs from traditional software development in many aspects of its lifecycle. Established software methodologies for testing, debugging and troubleshooting result simply impractical when applied to machine learning models.

large language model, machine learning, specification, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

Dynamic Dispatching for Large-Scale Heterogeneous Fleet via Multi-agent Deep Reinforcement Learning

Zhang, Chi, Odonkor, Philip, Zheng, Shuai, Khorasgani, Hamed, Serita, Susumu, Gupta, Chetan

arXiv.org Artificial IntelligenceAug-24-2020

Dynamic dispatching is one of the core problems for operation optimization in traditional industries such as mining, as it is about how to smartly allocate the right resources to the right place at the right time. Conventionally, the industry relies on heuristics or even human intuitions which are often short-sighted and sub-optimal solutions. Leveraging the power of AI and Internet of Things (IoT), data-driven automation is reshaping this area. However, facing its own challenges such as large-scale and heterogenous trucks running in a highly dynamic environment, it can barely adopt methods developed in other domains (e.g., ride-sharing). In this paper, we propose a novel Deep Reinforcement Learning approach to solve the dynamic dispatching problem in mining. We first develop an event-based mining simulator with parameters calibrated in real mines. Then we propose an experience-sharing Deep Q Network with a novel abstract state/action representation to learn memories from heterogeneous agents altogether and realizes learning in a centralized way. We demonstrate that the proposed methods significantly outperform the most widely adopted approaches in the industry by $5.56\%$ in terms of productivity. The proposed approach has great potential in a broader range of industries (e.g., manufacturing, logistics) which have a large-scale of heterogenous equipment working in a highly dynamic environment, as a general framework for dynamic resource allocation.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2008.10713

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.64)

Industry:

Materials > Metals & Mining (1.00)
Leisure & Entertainment > Games (0.68)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improved Memories Learning

Varoli, Francesco, Novati, Guido, Vlachas, Pantelis R., Koumoutsakos, Petros

arXiv.org Machine LearningAug-24-2020

We propose Improved Memories Learning (IMeL), a novel algorithm that turns reinforcement learning (RL) into a supervised learning (SL) problem and delimits the role of neural networks (NN) to interpolation. IMeL consists of two components. The first is a reservoir of experiences. Each experience is updated based on a non-parametric procedural improvement of the policy, computed as a bounded one-sample Monte Carlo estimate. The second is a NN regressor, which receives as input improved experiences from the reservoir (context points) and computes the policy by interpolation. The NN learns to measure the similarity between states in order to compute long-term forecasts by averaging experiences, rather than by encoding the problem structure in the NN parameters. We present preliminary results and propose IMeL as a baseline method for assessing the merits of more complex models and inductive biases.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2008.10433

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning 2.0

#artificialintelligenceAug-23-2020, 15:20:19 GMT

Free Coupon Discount - Deep Reinforcement Learning 2.0, The smartest combination of Deep Q-Learning, Policy Gradient, Actor Critic, and DDPG Created by Hadelin de Ponteves, Kirill Eremenko, SuperDataScience Team Students also bought Natural Language Processing with Deep Learning in Python Recommender Systems and Deep Learning in Python Data Science: Natural Language Processing (NLP) in Python Deep Learning and Computer Vision A-Z: OpenCV, SSD & GANs The Complete Neural Networks Bootcamp: Theory, Applications Cutting-Edge AI: Deep Reinforcement Learning in Python Preview this Udemy Course GET COUPON CODE Description Welcome to Deep Reinforcement Learning 2.0! In this course, we will learn and implement a new incredibly smart AI model, called the Twin-Delayed DDPG, which combines state of the art techniques in Artificial Intelligence including continuous Double Deep Q-Learning, Policy Gradient, and Actor Critic. The model is so strong that for the first time in our courses, we are able to solve the most challenging virtual AI applications (training an ant/spider and a half humanoid to walk and run across a field). To approach this model the right way, we structured the course in three parts: Part 1: Fundamentals In this part we will study all the fundamentals of Artificial Intelligence which will allow you to understand and master the AI of this course. These include Q-Learning, Deep Q-Learning, Policy Gradient, Actor-Critic and more.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Genre:

Instructional Material > Course Syllabus & Notes (0.54)
Instructional Material > Online (0.39)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)

Add feedback

Learning Off-Policy with Online Planning

Sikchi, Harshit, Zhou, Wenxuan, Held, David

arXiv.org Artificial IntelligenceAug-23-2020

We propose Learning Off-Policy with Online Planning (LOOP), combining the techniques from model-based and model-free reinforcement learning algorithms. The agent learns a model of the environment, and then uses trajectory optimization with the learned model to select actions. To sidestep the myopic effect of fixed horizon trajectory optimization, a value function is attached to the end of the planning horizon. This value function is learned through off-policy reinforcement learning, using trajectory optimization as its behavior policy. Furthermore, we introduce "actor-guided" trajectory optimization to mitigate the actor-divergence issue in the proposed method. We benchmark our methods on continuous control tasks and demonstrate that it offers a significant improvement over the underlying model-based and model-free algorithms.

artificial intelligence, reinforcement learning, trajectory optimization, (13 more...)

arXiv.org Artificial Intelligence

2008.10066

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

Qiu, Shuang, Yang, Zhuoran, Wei, Xiaohan, Ye, Jieping, Wang, Zhaoran

arXiv.org Machine LearningAug-23-2020

Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy evaluation has achieved great success in modern reinforcement learning. It is shown that such a problem can be reformulated as a stochastic nonconvex-strongly-concave optimization problem, which is challenging as naive stochastic gradient descent-ascent algorithm suffers from slow convergence. Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data. However, in practice, a single-timescale single-loop stochastic algorithm is preferred due to its simplicity and also because its step-size is easier to tune. In this paper, we propose two single-timescale single-loop algorithms which require only one data point each step. Our first algorithm implements momentum updates on both primal and dual variables achieving an $O(\varepsilon^{-4})$ sample complexity, which shows the important role of momentum in obtaining a single-timescale algorithm. Our second algorithm improves upon the first one by applying variance reduction on top of momentum, which matches the best known $O(\varepsilon^{-3})$ sample complexity in existing works. Furthermore, our variance-reduction algorithm does not require a large-batch checkpoint. Moreover, our theoretical results for both algorithms are expressed in a tighter form of simultaneous primal and dual side convergence.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2008.10103

Country:

North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback