Reinforcement Learning
Momentum Q-learning with Finite-Sample Convergence Guarantee
Weng, Bowen, Xiong, Huaqing, Zhao, Lin, Liang, Yingbin, Zhang, Wei
Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards
Goyal, Prasoon, Niekum, Scott, Mooney, Raymond J.
Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environment, and/or assume some structure in the natural language commands. In this work, we propose a model that directly maps pixels to rewards, given a free-form natural language description of the task, which can then be used for policy learning. Our experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improves the sample efficiency of policy learning, both in sparse and dense reward settings.
Data Science 2020 : Complete Data Science & Machine Learning
Online Courses Udemy Data Science 2020: Complete Data Science & Machine Learning, Machine Learning A-Z, Data Science, Python for Machine Learning, Math for Machine Learning, Statistics for Data Science Created by Jitesh Khurkhuriya Jitesh's Data Science & Machine Learning A-Z Team Students also bought Natural Language Processing with Deep Learning in Python Advanced AI: Deep Reinforcement Learning in Python Unsupervised Machine Learning Hidden Markov Models in Python Artificial Intelligence: Reinforcement Learning in Python Ensemble Machine Learning in Python: Random Forest, AdaBoost Preview this course GET COUPON CODE Description Data Science and Machine Learning are the hottest skills in demand but challenging to learn. Did you wish that there was one course for Data Science and Machine Learning that covers everything from Math for Machine Learning, Advance Statistics for Data Science, Data Processing, Machine Learning A-Z, Deep learning and more? Well, you have come to the right place. This Data Science and Machine Learning course has 250 lectures, more than 25 hours of content, 11 projects including one Kaggle competition with top 1 percentile score, code templates and various quizzes. Today Data Science and Machine Learning is used in almost all the industries, including automobile, banking, healthcare, media, telecom and others.
Cutting-Edge AI: Deep Reinforcement Learning in Python
Online Courses Udemy - Cutting-Edge AI: Deep Reinforcement Learning in Python, Apply deep learning to artificial intelligence and reinforcement learning using evolution strategies, A2C, and DDPG Highest Rated Created by Lazy Programmer Inc. English [Auto] Students also bought Machine Learning and AI: Support Vector Machines in Python Unsupervised Machine Learning Hidden Markov Models in Python Unsupervised Deep Learning in Python Advanced AI: Deep Reinforcement Learning in Python Data Science: Deep Learning in Python Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Preview this course GET COUPON CODE Description Welcome to Cutting-Edge AI! This is technically Deep Learning in Python part 11 of my deep learning series, and my 3rd reinforcement learning course. Deep Reinforcement Learning is actually the combination of 2 topics: Reinforcement Learning and Deep Learning (Neural Networks). While both of these have been around for quite some time, it's only been recently that Deep Learning has really taken off, and along with it, Reinforcement Learning. The maturation of deep learning has propelled advances in reinforcement learning, which has been around since the 1980s, although some aspects of it, such as the Bellman equation, have been for much longer.
DeepMind's Newest AI Programs Itself to Make All the Right Decisions
Now, Alphabet's DeepMind is taking this automation further by developing deep learning algorithms that can handle programming tasks which have been, to date, the sole domain of the world's top computer scientists (and take them years to write). In a paper recently published on the pre-print server arXiv, the DeepMind team described a new deep reinforcement learning algorithm that was able to discover its own value function--a critical programming rule in deep reinforcement learning--from scratch. Surprisingly, the algorithm was also effective beyond the simple environments it trained in, going on to play Atari games--a different, more complicated task, achieving superhuman levels of play in 14 games. DeepMind says the approach could accelerate the development of reinforcement learning algorithms and even lead to a shift in focus, where instead of spending years writing the algorithms themselves, researchers work to perfect the environments in which they train. Move by move, game by game, an algorithm combines experience and value function to learn which actions bring greater rewards and improves its play, until eventually, engineers may shift from manually developing the algorithms themselves to building the environments where they learn.
Facebook develops AI algorithm that learns to play poker on the fly
Facebook researchers have developed a general AI framework called Recursive Belief-based Learning (ReBeL) that they say achieves better-than-human performance in heads-up, no-limit Texas hold'em poker while using less domain knowledge than any prior poker AI. They assert that ReBeL is a step toward developing universal techniques for multi-agent interactions -- in other words, general algorithms that can be deployed in large-scale, multi-agent settings. Potential applications run the gamut from auctions, negotiations, and cybersecurity to self-driving cars and trucks. Combining reinforcement learning with search at AI model training and test time has led to a number of advances. Reinforcement learning is where agents learn to achieve goals by maximizing rewards, while search is the process of navigating from a start to a goal state.
Adversarial Robustness for Machine Learning Cyber Defenses Using Log Data
Steverson, Kai, Mullin, Jonathan, Ahiskali, Metin
There has been considerable and growing interest in applying machine learning for cyber defenses. One promising approach has been to apply natural language processing techniques to analyze logs data for suspicious behavior. A natural question arises to how robust these systems are to adversarial attacks. Defense against sophisticated attack is of particular concern for cyber defenses. In this paper, we develop a testing framework to evaluate adversarial robustness of machine learning cyber defenses, particularly those focused on log data. Our framework uses techniques from deep reinforcement learning and adversarial natural language processing. We validate our framework using a publicly available dataset and demonstrate that our adversarial attack does succeed against the target systems, revealing a potential vulnerability. We apply our framework to analyze the influence of different levels of dropout regularization and find that higher dropout levels increases robustness. Moreover 90% dropout probability exhibited the highest level of robustness by a significant margin, which suggests unusually high dropout may be necessary to properly protect against adversarial attacks.
Explainable robotic systems: Understanding goal-driven actions in a reinforcement learning scenario
Cruz, Francisco, Dazeley, Richard, Vamplew, Peter
Robotic systems are more present in our society everyday. In human-robot environments, it is crucial that end-users may correctly understand their robotic team-partners, in order to collaboratively complete a task. To increase action understanding, users demand more explainability about the decisions by the robot in particular situations. Recently, explainable robotic systems have emerged as an alternative focused not only on completing a task satisfactorily, but also in justifying, in a human-like manner, the reasons that lead to making a decision. In reinforcement learning scenarios, a great effort has been focused on providing explanations using data-driven approaches, particularly from the visual input modality in deep learning-based systems. In this work, we focus on the decision-making process of a reinforcement learning agent performing a simple navigation task in a robotic scenario. As a way to explain the goal-driven robot's actions, we use the probability of success computed by three different proposed approaches: memory-based, learning-based, and introspection-based. The difference between these approaches is the amount of memory required to compute or estimate the probability of success as well as the kind of reinforcement learning representation where they could be used. In this regard, we use the memory-based approach as a baseline since it is obtained directly from the agent's observations. When comparing the learning-based and the introspection-based approaches to this baseline, both are found to be suitable alternatives to compute the probability of success, obtaining high levels of similarity when compared using both the Pearson's correlation and the mean squared error.
Low Dimensional State Representation Learning with Reward-shaped Priors
Botteghi, Nicolò, Obbink, Ruben, Geijs, Daan, Poel, Mannes, Sirmacek, Beril, Brune, Christoph, Mersha, Abeje, Stramigioli, Stefano
Reinforcement Learning has been able to solve many complicated robotics tasks without any need for feature engineering in an end-to-end fashion. However, learning the optimal policy directly from the sensory inputs, i.e the observations, often requires processing and storage of a huge amount of data. In the context of robotics, the cost of data from real robotics hardware is usually very high, thus solutions that achieve high sample-efficiency are needed. We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. Using the samples from the state space, the optimal policy is quickly and efficiently learned. We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.