Reinforcement Learning
Linear Representation Meta-Reinforcement Learning for Instant Adaptation
Peng, Matt, Zhu, Banghua, Jiao, Jiantao
This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing. FLAP builds upon the idea of learning a shared linear representation of the policy so that when adapting to a new task, it suffices to predict a set of linear weights. A separate adapter network is trained simultaneously with the policy such that during adaptation, we can directly use the adapter network to predict these linear weights instead of updating a meta-policy via gradient descent, such as in prior meta-RL methods like MAML, to obtain the new policy. The application of the separate feed-forward network not only speeds up the adaptation run-time significantly, but also generalizes extremely well to very different tasks that prior Meta-RL methods fail to generalize to. Experiments on standard continuous-control meta-RL benchmarks show FLAP presenting significantly stronger performance on out-of-distribution tasks with up to double the average return and up to 8X faster adaptation run-time speeds when compared to prior methods.
Solving Common-Payoff Games with Approximate Policy Iteration
Sokota, Samuel, Lockhart, Edward, Timbers, Finbarr, Davoodi, Elnaz, D'Orazio, Ryan, Burch, Neil, Schmid, Martin, Bowling, Michael, Lanctot, Marc
For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. Code is available at https://github.com/ssokota/capi .
First-Order Problem Solving through Neural MCTS based Reinforcement Learning
Xu, Ruiyang, Kadam, Prashank, Lieberherr, Karl
The formal semantics of an interpreted first-order logic (FOL) statement can be given in Tarskian Semantics or a basically equivalent Game Semantics. The latter maps the statement and the interpretation into a two-player semantic game. Many combinatorial problems can be described using interpreted FOL statements and can be mapped into a semantic game. Therefore, learning to play a semantic game perfectly leads to the solution of a specific instance of a combinatorial problem. We adapt the AlphaZero algorithm so that it becomes better at learning to play semantic games that have different characteristics than Go and Chess. We propose a general framework, Persephone, to map the FOL description of a combinatorial problem to a semantic game so that it can be solved through a neural MCTS based reinforcement learning algorithm. Our goal for Persephone is to make it tabula-rasa, mapping a problem stated in interpreted FOL to a solution without human intervention.
Reinforcement Learning under Model Risk for Biomanufacturing Fermentation Control
Wang, Bo, Xie, Wei, Martagan, Tugce, Akcay, Alp
In the biopharmaceutical manufacturing, fermentation process plays a critical role impacting on productivity and profit. Since biotherapeutics are manufactured in living cells whose biological mechanisms are complex and have highly variable outputs, in this paper, we introduce a model-based reinforcement learning framework accounting for model risk to support bioprocess online learning and guide the optimal and robust customized stopping policy for fermentation process. Specifically, built on the dynamic mechanisms of protein and impurity generation, we first construct a probabilistic model characterizing the impact of underlying bioprocess stochastic uncertainty on impurity and protein growth rates. Since biopharmaceutical manufacturing often has very limited data during the development and early stage of production, we derive the posterior distribution quantifying the process model risk, and further develop the Bayesian rule based knowledge update to support the online learning on underlying stochastic process. With the prediction risk accounting for both bioprocess stochastic uncertainty and model risk, the proposed reinforcement learning framework can proactively hedge all sources of uncertainties and support the optimal and robust customized decision making. We conduct the structural analysis of optimal policy and study the impact of model risk on the policy selection. We can show that it asymptotically converges to the optimal policy obtained under perfect information of underlying stochastic process. Our case studies demonstrate that the proposed framework can greatly improve the biomanufacturing industrial practice.
Learning in PyTorch Modern Reinforcement Learning: Deep Q
You will then learn how to implement these in pythonic and concise PyTorch code, that can be extended to include any future deep Q learning algorithms. These algorithms will be used to solve a variety of environments from the Open AI gym's Atari library, including Pong, Breakout, and Bankheist. You will learn the key to making these Deep Q Learning algorithms work, which is how to modify the Open AI Gym's Atari library to meet the specifications of the original Deep Q Learning papers. Also included is a mini course in deep learning using the PyTorch framework. This is geared for students who are familiar with the basic concepts of deep learning, but not the specifics, or those who are comfortable with deep learning in another framework, such as Tensorflow or Keras.
Machine Learning for Electronic Design Automation: A Survey
Huang, Guyue, Hu, Jingbo, He, Yifan, Liu, Jialong, Ma, Mingyuan, Shen, Zhaoyang, Wu, Juejian, Xu, Yuanfan, Zhang, Hengrui, Zhong, Kai, Ning, Xuefei, Ma, Yuzhe, Yang, Haoyu, Yu, Bei, Yang, Huazhong, Wang, Yu
In recent years, with the development of semiconductor technology, the scale of integrated circuit (IC) has grown exponentially, challenging the scalability and reliability of the circuit design flow. Therefore, EDA algorithms and software are required to be more effective and efficient to deal with extremely large search space with low latency. Machine learning (ML) is taking an important role in our lives these days, which has been widely used in many scenarios. ML methods, including traditional and deep learning algorithms, achieve amazing performance in solving classification, detection, and design space exploration problems. Additionally, ML methods show great potential to generate high-quality solutions for many NP-complete (NPC) problems, which are common in the EDA field, while traditional methods lead to huge time and resource consumption to solve these problems. Traditional methods usually solve every problem from the beginning, with a lack of knowledge accumulation. Instead, ML algorithms focus on extracting high-level features or patterns that can be reused in other related or similar situations, avoiding repeated complicated analysis. Therefore, applying machine learning methods is a promising direction to accelerate the solving of EDA problems. These authors are ordered alphabetically.
A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules
Taghian, Mehran, Asadi, Ahmad, Safabakhsh, Reza
A wide variety of deep reinforcement learning (DRL) models have recently been proposed to learn profitable investment strategies. The rules learned by these models outperform the previous strategies specially in high frequency trading environments. However, it is shown that the quality of the extracted features from a long-term sequence of raw prices of the instruments greatly affects the performance of the trading rules learned by these models. Employing a neural encoder-decoder structure to extract informative features from complex input time-series has proved very effective in other popular tasks like neural machine translation and video captioning in which the models face a similar problem. The encoder-decoder framework extracts highly informative features from a long sequence of prices along with learning how to generate outputs based on the extracted features. In this paper, a novel end-to-end model based on the neural encoder-decoder framework combined with DRL is proposed to learn single instrument trading strategies from a long sequence of raw prices of the instrument. The proposed model consists of an encoder which is a neural structure responsible for learning informative features from the input sequence, and a decoder which is a DRL model responsible for learning profitable strategies based on the features extracted by the encoder. The parameters of the encoder and the decoder structures are learned jointly, which enables the encoder to extract features fitted to the task of the decoder DRL. In addition, the effects of different structures for the encoder and various forms of the input sequences on the performance of the learned strategies are investigated. Experimental results showed that the proposed model outperforms other state-of-the-art models in highly dynamic environments.
Evolving Reinforcement Learning Algorithms
Co-Reyes, John D., Miao, Yingjie, Peng, Daiyi, Real, Esteban, Levine, Sergey, Le, Quoc V., Lee, Honglak, Faust, Aleksandra
We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods. Designing new deep reinforcement learning algorithms that can efficiently solve across a wide variety of problems generally requires a tremendous amount of manual effort. Learning to design reinforcement learning algorithms or even small sub-components of algorithms would help ease this burden and could result in better algorithms than researchers could design manually. Our work might then shift from designing these algorithms manually into designing the language and optimization methods for developing these algorithms automatically. Reinforcement learning algorithms can be viewed as a procedure that maps an agent's experience to a policy that obtains high cumulative reward over the course of training. We formulate the problem of training an agent as one of meta-learning: an outer loop searches over the space of computational graphs or programs that compute the objective function for the agent to minimize and an inner loop performs the updates using the learned loss function. The objective of the outer loop is to maximize the training return of the inner loop algorithm.
Robust and Scalable Routing with Multi-Agent Deep Reinforcement Learning for MANETs
Kaviani, Saeed, Ryu, Bo, Ahmed, Ejaz, Larson, Kevin A., Le, Anh, Yahja, Alex, Kim, Jae H.
We address the packet routing problem in highly dynamic mobile ad-hoc networks (MANETs). In the network routing problem each router chooses the next-hop(s) of each packet to deliver the packet to a destination with lower delay, higher reliability, and less overhead in the network. In this paper, we present a novel framework and routing policies, DeepCQ+ routing, using multi-agent deep reinforcement learning (MADRL) which is designed to be robust and scalable for MANETs. Unlike other deep reinforcement learning (DRL)-based routing solutions in the literature, our approach has enabled us to train over a limited range of network parameters and conditions, but achieve realistic routing policies for a much wider range of conditions including a variable number of nodes, different data flows with varying data rates and source/destination pairs, diverse mobility levels, and other dynamic topology of networks. We demonstrate the scalability, robustness, and performance enhancements obtained by DeepCQ+ routing over a recently proposed model-free and non-neural robust and reliable routing technique (i.e. CQ+ routing). DeepCQ+ routing outperforms non-DRL-based CQ+ routing in terms of overhead while maintains same goodput rate. Under a wide range of network sizes and mobility conditions, we have observed the reduction in normalized overhead of 10-15%, indicating that the DeepCQ+ routing policy delivers more packets end-to-end with less overhead used. To the best of our knowledge, this is the first successful application of MADRL for the MANET routing problem that simultaneously achieves scalability and robustness under dynamic conditions while outperforming its non-neural counterpart. More importantly, we provide a framework to design scalable and robust routing policy with any desired network performance metric of interest.
Reinforced Imitative Graph Representation Learning for Mobile User Profiling: An Adversarial Training Perspective
Wang, Dongjie, Wang, Pengyang, Liu, Kunpeng, Zhou, Yuanchun, Hughes, Charles, Fu, Yanjie
In this paper, we study the problem of mobile user profiling, which is a critical component for quantifying users' characteristics in the human mobility modeling pipeline. Human mobility is a sequential decision-making process dependent on the users' dynamic interests. With accurate user profiles, the predictive model can perfectly reproduce users' mobility trajectories. In the reverse direction, once the predictive model can imitate users' mobility patterns, the learned user profiles are also optimal. Such intuition motivates us to propose an imitation-based mobile user profiling framework by exploiting reinforcement learning, in which the agent is trained to precisely imitate users' mobility patterns for optimal user profiles. Specifically, the proposed framework includes two modules: (1) representation module, which produces state combining user profiles and spatio-temporal context in real-time; (2) imitation module, where Deep Q-network (DQN) imitates the user behavior (action) based on the state that is produced by the representation module. However, there are two challenges in running the framework effectively. First, epsilon-greedy strategy in DQN makes use of the exploration-exploitation trade-off by randomly pick actions with the epsilon probability. Such randomness feeds back to the representation module, causing the learned user profiles unstable. To solve the problem, we propose an adversarial training strategy to guarantee the robustness of the representation module. Second, the representation module updates users' profiles in an incremental manner, requiring integrating the temporal effects of user profiles. Inspired by Long-short Term Memory (LSTM), we introduce a gated mechanism to incorporate new and old user characteristics into the user profile.