AITopics

2009.02476

Country: North America > United States > Wisconsin (0.29)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.95)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceSep-5-2020

Critic Regularized Regression

Wang, Ziyu, Novikov, Alexander, Zolna, Konrad, Springenberg, Jost Tobias, Reed, Scott, Shahriari, Bobak, Siegel, Noah, Merel, Josh, Gulcehre, Caglar, Heess, Nicolas, de Freitas, Nando

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

machine learning, rcrr binary max rcrr, reinforcement learning, (13 more...)

2006.15134

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ferreira, Janderson, Júnior, Agostinho A. F., Castro, Letícia, Galvão, Yves M., Barros, Pablo, Fernandes, Bruno J. T.

Analysis of Social Robotic Navigation approaches: CNN Encoder and Incremental Learning as an alternative to Deep Reinforcement Learning

arXiv.org Artificial IntelligenceSep-5-2020

Dealing with social tasks in robotic scenarios is difficult, as having humans in the learning loop is incompatible with most of the state-of-the-art machine learning algorithms. This is the case when exploring Incremental learning models, in particular the ones involving reinforcement learning. In this work, we discuss this problem and possible solutions by analysing a previous study on adaptive convolutional encoders for a social navigation task.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

2008.07965

Country:

South America > Brazil > Pernambuco > Recife (0.05)
Europe > Italy (0.05)

Genre: Research Report (0.40)

Industry: Transportation (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zehfroosh, Ashkan, Tanner, Herbert G.

A Hybrid PAC Reinforcement Learning Algorithm

arXiv.org Machine LearningSep-5-2020

This paper offers a new hybrid probably asymptotically correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of its parents. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results that support the claim regarding the new algorithm's sample efficiency compared to its parents are showcased in a small grid-world example.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2009.02602

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceSep-4-2020, 21:06:14 GMT

Using Reinforcement Learning to Design Missed Thrust Resilient Trajectories - ASC- 2020 - Gereshes

Note: This post is adapted from my conference paper, that I presented at the Astrodynamics Specialists Conference in Summer 2020. You can read the full paper here. From ion thrusters to solar sails, spacecraft continue to adopt new and more efficient forms of propulsion. As these low-thrust propulsion methods have become more prevalent, new challenges have arisen. Depending on the mission, low-thrust propulsion elements may need to thrust continuously for days/months.

design missed thrust resilient trajectory, machine learning, reinforcement learning, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)
Information Technology > Communications > Social Media (0.40)

#artificialintelligenceSep-4-2020, 14:10:20 GMT

Machine Learning Strategy and Intro to Reinforcement Learning

NOTE: This course is a continuation of XCS229i: Machine Learning. Though not strictly required, it is highly recommended to take XCS229i before enrolling in XCS229ii, as assignments assume knowledge of topics in the first course. As machine learning models grow in sophistication, it is increasingly important for its practitioners to be comfortable navigating their many tuning parameters. Through video lectures and hands-on exercises, this course will equip you with the knowledge to get the most out of your data. You will learn the concepts and techniques you need to guide teams of ML practitioners.

artificial intelligence, machine learning strategy and intro, reinforcement learning, (6 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.42)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (0.57)
Education > Educational Technology > Educational Software > Computer Based Training (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.35)

Ishihara, Seiji, Igarashi, Harukazu

Policy Gradient Reinforcement Learning for Policy Represented by Fuzzy Rules: Application to Simulations of Speed Control of an Automobile

arXiv.org Artificial IntelligenceSep-4-2020

A method of a fusion of fuzzy inference and policy gradient reinforcement learning has been proposed that directly learns, as maximizes the expected value of the reward per episode, parameters in a policy function represented by fuzzy rules with weights. A study has applied this method to a task of speed control of an automobile and has obtained correct policies, some of which control speed of the automobile appropriately but many others generate inappropriate vibration of speed. In general, the policy is not desirable that causes sudden time change or vibration in the output value, and there would be many cases where the policy giving smooth time change in the output value is desirable. In this paper, we propose a fusion method using the objective function, that introduces defuzzification with the center of gravity model weighted stochastically and a constraint term for smoothness of time change, as an improvement measure in order to suppress sudden change of the output value of the fuzzy controller. Then we show the learning rule in the fusion, and also consider the effect by reward functions on the fluctuation of the output value. As experimental results of an application of our method on speed control of an automobile, it was confirmed that the proposed method has the effect of suppressing the undesirable fluctuation in time-series of the output value. Moreover, it was also showed that the difference between reward functions might adversely affect the results of learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

doi: 10.3156/jsoft.32.4_801

2009.02083

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bekci, Recep Yusuf, Gümüş, Mehmet

Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization

arXiv.org Machine LearningSep-4-2020

Continuous control is a widely applicable area of reinforcement learning. The main players of this area are actor-critic methods that utilize policy gradients of neural approximators as a common practice. The focus of our study is to show the characteristics of the actor loss function which is the essential part of the optimization. We exploit low dimensional visualizations of the loss function and provide comparisons for loss landscapes of various algorithms. Furthermore, we apply our approach to multi-store dynamic inventory control, a notoriously difficult problem in supply chain operations, and explore the shape of the loss function associated with the optimal policy. We modelled and solved the problem using reinforcement learning while having a loss landscape in favor of optimality.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2009.02391

Country: North America > Canada > Quebec > Montreal (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

#artificialintelligenceSep-3-2020, 20:47:01 GMT

Scientists are developing an autonomous artificial intelligence system that can selectively grip and …

The key to this development lies in so-called reinforcement learning, a special variant of machine learning. "We do not prescribe a solution pathway for …

autonomous artificial intelligence system, machine learning, reinforcement learning, (2 more...)

Industry: Media > News (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

#artificialintelligenceSep-3-2020, 10:13:13 GMT

Modern Reinforcement Learning: Actor-Critic Methods

Modern Reinforcement Learning: Actor-Critic Methods Udemy Coupon ED How to Implement Cutting Edge Artificial Intelligence Research Papers in the Open AI Gym Using the PyTorch Framework Get Udemy Course What you'll learn How to code policy gradient methods in PyTorch How to code Deep Deterministic Policy Gradients (DDPG) in PyTorch How to code Twin Delayed Deep Deterministic Policy Gradients (TD3) in PyTorch How to code actor critic algorithms in PyTorch How to implement cutting edge artificial intelligence research papers in Python Description In this advanced course on deep reinforcement learning, you will learn how to implement policy gradient, actor critic, deep deterministic policy gradient (DDPG), and twin delayed deep deterministic policy gradient (TD3) algorithms in a variety of challenging environments from the Open AI gym. The course begins with a practical review of the fundamentals of reinforcement learning, including topics such as: The Bellman Equation Markov Decision Processes Monte Carlo Prediction Temporal Difference Prediction TD(0) Temporal Difference Control with Q Learning And moves straight into coding up our first agent: a blackjack playing artificial intelligence. From there we will progress to teaching an agent to balance the cart pole using Q learning. After mastering the fundamentals, the pace quickens, and we move straight into an introduction to policy gradient methods. We cover the REINFORCE algorithm, and use it to teach an artificial intelligence to land on the moon in the lunar lander environment from the Open AI gym.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)