Reinforcement Learning
Random Ensemble Machine Learning in Python: Random Udemy
Ensemble Machine Learning in Python: Random Forest, AdaBoost 4.6 (1,193 ratings) Course Ratings are calculated from individual students' ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately. In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever.
Aicavity Global
Hi, I am José Luis. I have B.S., M.S. and Lic. in Physics, and currently I'm a Ph.D. Candidate in Physics at Uppsala University, Sweden. I have worked as a Research Engineer using Deep Reinforcement Learning to track multiple targets for autonomous vehicles at Veoneer. Additionally, I have taught thousands of students at Universities in Brazil and abroad. I work with Computer Simulations and I will share my experiences within programming across different fields.
Explainability of Intelligent Transportation Systems using Knowledge Compilation: a Traffic Light Controller Case
Wollenstein-Betech, Salomón, Muise, Christian, Cassandras, Christos G., Paschalidis, Ioannis Ch., Khazaeni, Yasaman
Usage of automated controllers which make decisions on an environment are widespread and are often based on black-box models. We use Knowledge Compilation theory to bring explainability to the controller's decision given the state of the system. For this, we use simulated historical state-action data as input and build a compact and structured representation which relates states with actions. We implement this method in a Traffic Light Control scenario where the controller selects the light cycle by observing the presence (or absence) of vehicles in different regions of the incoming roads.
On the Reliability and Generalizability of Brain-inspired Reinforcement Learning Algorithms
Kim, Dongjae, Lee, Jee Hang, Shin, Jae Hoon, Yang, Minsu Abel, Lee, Sang Wan
Although deep RL models have shown a great potential for solving various types of tasks with minimal supervision, several key challenges remain in terms of learning rapidly from limited experience, adapting to environmental changes, and generalizing learning from a single task. Recent evidence in decision neuroscience has shown that the human brain has an innate capacity to resolve these issues, leading to optimism regarding the development of neuroscience-inspired solutions toward sample-efficient, adaptive, and generalizable RL algorithms. We show that the computational model, adaptively combining model-based and model-free control, which we term the prefrontal RL, reliably encodes the information of highlevel policy that humans learned, and this model can generalize the learned policy to a wide range of tasks. First, we trained the prefrontal RL, deep RL, and meta RL algorithms on 82 human subjects' data, collected while human participants were performing two-stage Markov decision tasks, in which we experimentally manipulated the goal, state-transition uncertainty, and state-space complexity. In the reliability test, which is based on a combination of the latent behavior profile and the parameter recoverability test, we showed that the prefrontal RL reliably learned the latent policies of the human subjects, while all the other models failed to pass this test. Second, to empirically test the ability to generalize what these models learned from the original task, we situated them in the context of environmental volatility. Specifically, we ran large-scale simulations with 10 different Markov decision tasks, in which latent context variables change over time. Our information-theoretic analysis showed that the prefrontal RL showed the highest level of adaptability and episodic encoding efficacy. To the best of our knowledge, this is the first attempt to formally test the possibility that computational models mimicking the way the brain solves general problems can lead to practical solutions to key challenges in machine learning.
Learning Retrospective Knowledge with Reverse Reinforcement Learning
Zhang, Shangtong, Veeriah, Vivek, Whiteson, Shimon
We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge. General Value Functions (GVFs) have enjoyed great success in representing predictive knowledge, i.e., answering questions about possible future outcomes such as "how much fuel will be consumed in expectation if we drive from A to B?". GVFs, however, cannot answer questions like "how much fuel do we expect a car to have given it is at B at time $t$?". To answer this question, we need to know when that car had a full tank and how that car came to B. Since such questions emphasize the influence of possible past events on the present, we refer to their answers as retrospective knowledge. In this paper, we show how to represent retrospective knowledge with Reverse GVFs, which are trained via Reverse RL. We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.
AI in FinTech: A Research Agenda
Smart FinTech has emerged as a new area that synthesizes and transforms AI and finance, and broadly data science, machine learning, economics, etc. Smart FinTech also transforms and drives new economic and financial businesses, services and systems, and plays an increasingly important role in economy, technology and society transformation. This article presents a highly summarized research overview of smart FinTech, including FinTech businesses and challenges, various FinTech-associated data and repositories, FinTech-driven business decision and optimization, areas in smart FinTech, and research methods and techniques for smart FinTech.
Weakness Analysis of Cyberspace Configuration Based on Reinforcement Learning
Zhang, Lei, Bai, Wei, Guo, Shize, Xia, Shiming, Li, Hongmei, Pan, Zhisong
In this work, we present a learning-based approach to analysis cyberspace configuration. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of agents as attackers, our method becomes better at rapidly finding attack paths for previously hidden paths, especially in multiple domain cyberspace. To achieve these results, we pose finding attack paths as a Reinforcement Learning (RL) problem and train an agent to find multiple domain attack paths. To enable our RL policy to find more hidden attack paths, we ground representation introduction an multiple domain action select module in RL. By designing a simulated cyberspace experimental environment to verify our method. Our objective is to find more hidden attack paths, to analysis the weakness of cyberspace configuration. The experimental results show that our method can find more hidden multiple domain attack paths than existing baselines methods.
Learning to Prune Deep Neural Networks via Reinforcement Learning
Gupta, Manas, Aravindan, Siddharth, Kalisz, Aleksandra, Chandrasekhar, Vijay, Jie, Lin
This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. This enables PuRL to achieve sparsity and accuracy comparable to current state-of-the-art methods, while having a much shorter training cycle. PuRL achieves more than 80% sparsity on the ResNet-50 model while retaining a Top-1 accuracy of 75.37% on the ImageNet dataset. Through our experiments we show that PuRL is also able to sparsify already efficient architectures like MobileNet-V2. In addition to performance characterisation experiments, we also provide a discussion and analysis of the various RL design choices that went into the tuning of the Markov Decision Process underlying PuRL. Lastly, we point out that PuRL is simple to use and can be easily adapted for various architectures.
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces
Domingues, Omar Darwiche, Ménard, Pierre, Pirotta, Matteo, Kaufmann, Emilie, Valko, Michal
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.
Provably-Efficient Double Q-Learning
Weng, Wentao, Gupta, Harsh, He, Niao, Ying, Lei, Srikant, R.
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.