AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Random Ensemble Machine Learning in Python: Random Udemy

#artificialintelligenceJul-9-2020, 12:07:23 GMT

Ensemble Machine Learning in Python: Random Forest, AdaBoost 4.6 (1,193 ratings) Course Ratings are calculated from individual students' ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately. In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games (0.61)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.61)
(2 more...)

Add feedback

Aicavity Global

#artificialintelligenceJul-9-2020, 00:36:10 GMT

Hi, I am José Luis. I have B.S., M.S. and Lic. in Physics, and currently I'm a Ph.D. Candidate in Physics at Uppsala University, Sweden. I have worked as a Research Engineer using Deep Reinforcement Learning to track multiple targets for autonomous vehicles at Veoneer. Additionally, I have taught thousands of students at Universities in Brazil and abroad. I work with Computer Simulations and I will share my experiences within programming across different fields.

machine learning, reinforcement learning, university, (2 more...)

#artificialintelligence

Country:

South America > Brazil (0.37)
Europe > Sweden > Uppsala County > Uppsala (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)

Add feedback

Explainability of Intelligent Transportation Systems using Knowledge Compilation: a Traffic Light Controller Case

Wollenstein-Betech, Salomón, Muise, Christian, Cassandras, Christos G., Paschalidis, Ioannis Ch., Khazaeni, Yasaman

arXiv.org Artificial IntelligenceJul-9-2020

Usage of automated controllers which make decisions on an environment are widespread and are often based on black-box models. We use Knowledge Compilation theory to bring explainability to the controller's decision given the state of the system. For this, we use simulated historical state-action data as input and build a compact and structured representation which relates states with actions. We implement this method in a Traffic Light Control scenario where the controller selects the light cycle by observing the presence (or absence) of vehicles in different regions of the incoming roads.

controller, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2007.04916

Country:

Europe (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

On the Reliability and Generalizability of Brain-inspired Reinforcement Learning Algorithms

Kim, Dongjae, Lee, Jee Hang, Shin, Jae Hoon, Yang, Minsu Abel, Lee, Sang Wan

arXiv.org Artificial IntelligenceJul-9-2020

Although deep RL models have shown a great potential for solving various types of tasks with minimal supervision, several key challenges remain in terms of learning rapidly from limited experience, adapting to environmental changes, and generalizing learning from a single task. Recent evidence in decision neuroscience has shown that the human brain has an innate capacity to resolve these issues, leading to optimism regarding the development of neuroscience-inspired solutions toward sample-efficient, adaptive, and generalizable RL algorithms. We show that the computational model, adaptively combining model-based and model-free control, which we term the prefrontal RL, reliably encodes the information of highlevel policy that humans learned, and this model can generalize the learned policy to a wide range of tasks. First, we trained the prefrontal RL, deep RL, and meta RL algorithms on 82 human subjects' data, collected while human participants were performing two-stage Markov decision tasks, in which we experimentally manipulated the goal, state-transition uncertainty, and state-space complexity. In the reliability test, which is based on a combination of the latent behavior profile and the parameter recoverability test, we showed that the prefrontal RL reliably learned the latent policies of the human subjects, while all the other models failed to pass this test. Second, to empirically test the ability to generalize what these models learned from the original task, we situated them in the context of environmental volatility. Specifically, we ran large-scale simulations with 10 different Markov decision tasks, in which latent context variables change over time. Our information-theoretic analysis showed that the prefrontal RL showed the highest level of adaptability and episodic encoding efficacy. To the best of our knowledge, this is the first attempt to formally test the possibility that computational models mimicking the way the brain solves general problems can lead to practical solutions to key challenges in machine learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2007.04578

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Retrospective Knowledge with Reverse Reinforcement Learning

Zhang, Shangtong, Veeriah, Vivek, Whiteson, Shimon

arXiv.org Artificial IntelligenceJul-9-2020

We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge. General Value Functions (GVFs) have enjoyed great success in representing predictive knowledge, i.e., answering questions about possible future outcomes such as "how much fuel will be consumed in expectation if we drive from A to B?". GVFs, however, cannot answer questions like "how much fuel do we expect a car to have given it is at B at time $t$?". To answer this question, we need to know when that car had a full tank and how that car came to B. Since such questions emphasize the influence of possible past events on the present, we refer to their answers as retrospective knowledge. In this paper, we show how to represent retrospective knowledge with Reverse GVFs, which are trained via Reverse RL. We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2007.06703

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AI in FinTech: A Research Agenda

Cao, Longbing

arXiv.org Artificial IntelligenceJul-9-2020

Smart FinTech has emerged as a new area that synthesizes and transforms AI and finance, and broadly data science, machine learning, economics, etc. Smart FinTech also transforms and drives new economic and financial businesses, services and systems, and plays an increasingly important role in economy, technology and society transformation. This article presents a highly summarized research overview of smart FinTech, including FinTech businesses and challenges, various FinTech-associated data and repositories, FinTech-driven business decision and optimization, areas in smart FinTech, and research methods and techniques for smart FinTech.

aids, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2007.12681

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)
Banking & Finance > Economy (0.93)
(3 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(6 more...)

Add feedback

Weakness Analysis of Cyberspace Configuration Based on Reinforcement Learning

Zhang, Lei, Bai, Wei, Guo, Shize, Xia, Shiming, Li, Hongmei, Pan, Zhisong

arXiv.org Artificial IntelligenceJul-9-2020

In this work, we present a learning-based approach to analysis cyberspace configuration. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of agents as attackers, our method becomes better at rapidly finding attack paths for previously hidden paths, especially in multiple domain cyberspace. To achieve these results, we pose finding attack paths as a Reinforcement Learning (RL) problem and train an agent to find multiple domain attack paths. To enable our RL policy to find more hidden attack paths, we ground representation introduction an multiple domain action select module in RL. By designing a simulated cyberspace experimental environment to verify our method. Our objective is to find more hidden attack paths, to analysis the weakness of cyberspace configuration. The experimental results show that our method can find more hidden multiple domain attack paths than existing baselines methods.

attack path, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2007.04614

Country:

North America > United States > California > Orange County > Anaheim (0.04)
Europe > Spain (0.04)
Asia > South Korea > Busan > Busan (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Prune Deep Neural Networks via Reinforcement Learning

Gupta, Manas, Aravindan, Siddharth, Kalisz, Aleksandra, Chandrasekhar, Vijay, Jie, Lin

arXiv.org Artificial IntelligenceJul-9-2020

This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. This enables PuRL to achieve sparsity and accuracy comparable to current state-of-the-art methods, while having a much shorter training cycle. PuRL achieves more than 80% sparsity on the ResNet-50 model while retaining a Top-1 accuracy of 75.37% on the ImageNet dataset. Through our experiments we show that PuRL is also able to sparsify already efficient architectures like MobileNet-V2. In addition to performance characterisation experiments, we also provide a discussion and analysis of the various RL design choices that went into the tuning of the Markov Decision Process underlying PuRL. Lastly, we point out that PuRL is simple to use and can be easily adapted for various architectures.

experiment, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2007.04756

Country:

Asia > Singapore (0.05)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

Domingues, Omar Darwiche, Ménard, Pierre, Pirotta, Matteo, Kaufmann, Emilie, Valko, Michal

arXiv.org Machine LearningJul-9-2020

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2007.05078

Country:

Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.40)

Add feedback

Provably-Efficient Double Q-Learning

Weng, Wentao, Gupta, Harsh, He, Niao, Ying, Lei, Srikant, R.

arXiv.org Machine LearningJul-9-2020

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2007.05034

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback