rl agent
Evolution-Guided Policy Gradient in Reinforcement Learning
Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer from high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA's ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with off-policy DRL's ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmarks demonstrate that ERL significantly outperforms prior DRL and EA methods.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Leisure & Entertainment > Games > Computer Games (0.96)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Germany > Berlin (0.04)
- North America > United States > California (0.05)
- North America > United States > Texas (0.05)
- North America > United States > Virginia (0.05)
- (6 more...)
- Energy > Energy Storage (0.93)
- Electrical Industrial Apparatus (0.68)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Questionnaire & Opinion Survey (0.94)
- Research Report > New Finding (0.94)
- Leisure & Entertainment > Games (1.00)
- Education (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Park: An Open Platform for Learning-Augmented Computer Systems
Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, ravichandra addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Dr.Mohammad Alizadeh
- South America (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- Information Technology (1.00)
- Leisure & Entertainment > Games > Computer Games (0.93)
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Leisure & Entertainment > Games > Computer Games (0.47)
RobustDeepReinforcementLearning throughAdversarialLoss
Our RADIAL-RL agents consistently outperform prior methods when tested against attacks of varying strength and are more computationally efficient to train. In addition, we propose a new evaluation method calledGreedyWorst-Case Reward(GWC) tomeasure attack agnostic robustness of deep RL agents. We show that GWC can be evaluated efficiently and is a good estimate of the reward under the worst possible sequence of adversarial attacks.