Goto

Collaborating Authors

Evolving Rewards to Automate Reinforcement Learning

arXiv.org Machine Learning

Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: \url{https://youtu.be/svdaOFfQyC8}.


nikbearbrown/INFO_7375

#artificialintelligence

In this seminar we do research in Computational Skepticism, that is, building systems to answer the question "Why Should I Trust an Algorithms Predictions?" As a group, students and any collaborators will be writing a book called "Computational Skepticism." Small groups of students will collaborate on writing a chapter. Two students have already started on their chapter on model interpretability, so you can see what the beginnings of this process looks like here https://maheshwarappa-a.gitbook.io/ads/ Once completed the Computational Skepticism book will be available for free online and published with an ISBN through the Banataba project through a publishing site such as https://www.Blurb.com.


Nik Bear Brown posted on LinkedIn

#artificialintelligence

INFO 7375 - Special Topics in Artificial Intelligence Engineering and Applications - Computational Skepticism is looking for experts to speak online this summer on a variety of subjects. The Computational Skepticism class is starting today!!! I'd like to thank Kinesso, H2O.ai, Squark Ai, ArrowDx, and the Computational Radiology Laboratory at Harvard/BCH for expressing an interest in speaking with the class. These are all online talks and can be with just a small group of around 20, or we can invite the thousands of Masters students in MGENs Boston, Silicon Valley, and Seattle campuses. These subjects include data quality and completeness, bias and fairness, AutoML, model interpretability, causal inference, counterfactual models, deep learning pipeline (AutoDL), time-series pipeline (AutoTS), feature engineering pipeline (AutoFE), autoVisualization (AutoViz), reinforcement learning pipeline (AutoRL), evidence knowledge graphs (EKG) We are looking for more companies and research groups that may be willing to share data and present how they are using machine learning.


Automated Reinforcement Learning: An Overview

arXiv.org Artificial Intelligence

Reinforcement Learning and recently Deep Reinforcement Learning are popular methods for solving sequential decision making problems modeled as Markov Decision Processes. RL modeling of a problem and selecting algorithms and hyper-parameters require careful considerations as different configurations may entail completely different performances. These considerations are mainly the task of RL experts; however, RL is progressively becoming popular in other fields where the researchers and system designers are not RL experts. Besides, many modeling decisions, such as defining state and action space, size of batches and frequency of batch updating, and number of timesteps are typically made manually. For these reasons, automating different components of RL framework is of great importance and it has attracted much attention in recent years. Automated RL provides a framework in which different components of RL including MDP modeling, algorithm selection and hyper-parameter optimization are modeled and defined automatically. In this article, we explore the literature and present recent work that can be used in automated RL. Moreover, we discuss the challenges, open questions and research directions in AutoRL.


Sample-Efficient Automated Deep Reinforcement Learning

arXiv.org Machine Learning

Despite significant progress in challenging problems across various domains, applying state-of-the-art deep reinforcement learning (RL) algorithms remains challenging due to their sensitivity to the choice of hyperparameters. This sensitivity can partly be attributed to the non-stationarity of the RL problem, potentially requiring different hyperparameter settings at various stages of the learning process. Additionally, in the RL setting, hyperparameter optimization (HPO) requires a large number of environment interactions, hindering the transfer of the successes in RL to real-world applications. In this work, we tackle the issues of sample-efficient and dynamic HPO in RL. We propose a population-based automated RL (AutoRL) framework to meta-optimize arbitrary off-policy RL algorithms. By sharing the collected experience across the population, we substantially increase the sample efficiency of the meta-optimization. We demonstrate the capabilities of our sample-efficient AutoRL approach in a case study with the popular TD3 algorithm in the MuJoCo benchmark suite, where we reduce the number of environment interactions needed for meta-optimization by up to an order of magnitude compared to population-based training. Deep reinforcement learning (RL) algorithms are often sensitive to the choice of internal hyperparameters (Jaderberg et al., 2017; Mahmood et al., 2018), and the hyperparameters of the neural network architecture (Islam et al., 2017; Henderson et al., 2018), hindering them from being applied out-of-the-box to new environments. Tuning hyperparameters of RL algorithms can quickly become very expensive, both in terms of high computational costs and a large number of required environment interactions. Especially in real-world applications, sample efficiency is crucial (Lee et al., 2019). Hyperparameter optimization (HPO; Snoek et al., 2012; Feurer & Hutter, 2019) approaches often treat the algorithm under optimization as a black-box, which in the setting of RL requires a full training run every time a configuration is evaluated. This leads to a suboptimal sample efficiency in terms of environment interactions.