Markov Models
Meta-Model-Based Meta-Policy Optimization
Hiraoka, Takuya, Imagawa, Takahisa, Tangkaratt, Voot, Osa, Takayuki, Onishi, Takashi, Tsuruoka, Yoshimasa
Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and has demonstrated its high sample efficiency. However, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model of an environment. Thus, its performance in a real environment tends to degrade when the predictive model is inaccurate. In this paper, we prove that performance degradation can be suppressed by using branched meta-rollouts. On the basis of this theoretical analysis, we propose Meta-Model-based Meta-Policy Optimization (M3PO), in which the branched meta-rollouts are used for policy optimization. We demonstrate that M3PO outperforms existing meta reinforcement learning methods in continuous-control benchmarks.
Bridging the Gaps in Statistical Models of Protein Alignment
Sumanaweera, Dinithi, Allison, Lloyd, Konagurthu, Arun S.
This work demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed from a time-parameterised substitution matrix and a time-parameterised 3-state alignment machine. All parameters of such a model can be inferred from any benchmark data-set of aligned protein sequences. This allows us to examine nine well-known substitution matrices on six benchmarks curated using various structural alignment methods; any matrix that does not explicitly model a "time"-dependent Markov process is converted to a corresponding base-matrix that does. In addition, a new optimal matrix is inferred for each of the six benchmarks. Using Minimum Message Length (MML) inference, all 15 matrices are compared in terms of measuring the Shannon information content of each benchmark. This has resulted in a new and clear overall best performed time-dependent Markov matrix, MMLSUM, and its associated 3-state machine, whose properties we have analysed in this work. For standard use, the MMLSUM series of (log-odds) \textit{scoring} matrices derived from the above Markov matrix, are available at https://lcb.infotech.monash.edu.au/mmlsum.
Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models
Patra, Sunandita, Mason, James, Ghallab, Malik, Nau, Dana, Traverso, Paolo
The most common representation formalisms for automated planning are descriptive models that abstractly describe what the actions do and are tailored for effciently computing the next state(s) in a state-transition system. However, real-world acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making in a dynamic environment. To use a different action model for planning than the one used for acting causes problems with combining acting and planning, in particular for the development and consistency verification of the different models. As an alternative, we define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from the planner. Our planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM (UCT Procedure for Operational Models), whose rollouts are simulations of the actor's operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve the acting efficiency and robustness of RAE. We discuss the asymptotic convergence of UPOM by mapping its search space to an MDP.
Reinforcement Learning of Simple Indirect Mechanisms
Brero, Gianluca, Eden, Alon, Gerstgrasser, Matthias, Parkes, David C., Rheingans-Yoo, Duncan
Over the last fifty years, a large body of research in microeconomics has introduced many different mechanisms for resource allocation. Despite the wide variety of available options, "simple" mechanisms such as posted price and serial dictatorship are often preferred for practical applications, including housing allocation [Abdulkadiroฤlu and Sรถnmez, 1998], online procurement [Badanidiyuru et al., 2012], or allocation of medical appointments [Klaus and Nichifor, 2019]. There has been considerable interest in formalizing different notions of simplicity. Li [2017] identifies mechanisms that are particularly simple from a strategic perspective, introducing the concept of obviously strategyproof mechanisms; under obviously strategyproof mechanisms, it is obvious that an agent cannot profit by trying to game the system, as even the worst possible final outcome from behaving truthfully is at least as good as the best possible outcome from any other strategy. Pycia and Troyan [2019] introduce the still stronger concept of strongly obviously strategyproof (SOSP) mechanisms, and show that this class can essentially be identified with sequential price mechanisms, where agents are visited in turn and offered a choice from a menu of options (which may or may not include transfers). SOSP mechanisms are ones in which an agent is not even required to consider her future (truthful) actions to understand that the mechanism is obviously strategyproof.
Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
Zintgraf, Luisa, Feng, Leo, Igl, Maximilian, Hartikainen, Kristian, Hofmann, Katja, Whiteson, Shimon
Meta-learning is a powerful tool for learning policies that can adapt efficiently when deployed in new tasks. If however the meta-training tasks have sparse rewards, the need for exploration during meta-training is exacerbated given that the agent has to explore and learn across many tasks. We show that current meta-learning methods can fail catastrophically in such environments. To address this problem, we propose HyperX, a novel method for meta-learning in sparse reward tasks. Using novel reward bonuses for meta-training, we incentivise the agent to explore in approximate hyper-state space, i.e., the joint state and approximate belief space, where the beliefs are over tasks. We show empirically that these bonuses allow an agent to successfully learn to solve sparse reward tasks where existing meta-learning methods fail.
VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models
Xiao, Zhisheng, Kreis, Karsten, Kautz, Jan, Vahdat, Arash
Energy-based models (EBMs) have recently been successful in representing complex distributions of small images. However, sampling from them requires expensive Markov chain Monte Carlo (MCMC) iterations that mix slowly in high dimensional pixel space. Unlike EBMs, variational autoencoders (VAEs) generate samples quickly and are equipped with a latent space that enables fast traversal of the data manifold. However, VAEs tend to assign high probability density to regions in data space outside the actual data distribution and often fail at generating sharp images. In this paper, we propose VAEBM, a symbiotic composition of a VAE and an EBM that offers the best of both worlds. VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE and it relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples. Moreover, the VAE component in VAEBM allows us to speed up MCMC updates by reparameterizing them in the VAE's latent space. Our experimental results show that VAEBM outperforms state-of-the-art VAEs and EBMs in generative quality on several benchmark image datasets by a large margin. It can generate high-quality images as large as 256$\times$256 pixels with short MCMC chains. We also demonstrate that VAEBM provides complete mode coverage and performs well in out-of-distribution detection.
Minimax Optimal Reinforcement Learning for Discounted MDPs
He, Jiafan, Zhou, Dongruo, Gu, Quanquan
The goal of reinforcement learning is designing algorithms to learn the optimal policy through interactions with the unknown dynamic environment. Markov decision processes (MDPs) plays a central role in reinforcement learning due to their ability to describe the time-independent state transition property. In specific, the discounted MDP is one of the standard MDPs in reinforcement learning to describe sequential tasks without interruption or restart. Various reinforcement learning algorithms have been proposed for discounted MDPs. In specific, Azar et al. (2013) proposed an Empirical QVI algorithm which achieves the optimal sample complexity to find the optimal value function. Sidford et al. (2018a) proposed a sublinear randomized value iteration algorithm that achieves a near-optimal sample complexity to find the optimal policy, and Sidford et al. (2018b) further improved it to reach the optimal sample complexity.
Bayesian Policy Search for Stochastic Domains
Tolpin, David, Zhou, Yuan, Yang, Hongseok
AI planning can be cast as inference in probabilistic models, and probabilistic programming was shown to be capable of policy search in partially observable domains. Prior work introduces policy search through Markov chain Monte Carlo in deterministic domains, as well as adapts black-box variational inference to stochastic domains, however not in the strictly Bayesian sense. In this work, we cast policy search in stochastic domains as a Bayesian inference problem and provide a scheme for encoding such problems as nested probabilistic programs. We argue that probabilistic programs for policy search in stochastic domains should involve nested conditioning, and provide an adaption of Lightweight Metropolis-Hastings (LMH) for robust inference in such programs. We apply the proposed scheme to stochastic domains and show that policies of similar quality are learned, despite a simpler and more general inference algorithm. We believe that the proposed variant of LMH is novel and applicable to a wider class of probabilistic programs with nested conditioning.
Deep learning for time series classification
Time series analysis is a field of data science which is interested in analyzing sequences of numerical values ordered in time. Time series are particularly interesting because they allow us to visualize and understand the evolution of a process over time. Their analysis can reveal trends, relationships and similarities across the data. There exists numerous fields containing data in the form of time series: health care (electrocardiogram, blood sugar, etc.), activity recognition, remote sensing, finance (stock market price), industry (sensors), etc. Time series classification consists of constructing algorithms dedicated to automatically label time series data. The sequential aspect of time series data requires the development of algorithms that are able to harness this temporal property, thus making the existing off-the-shelf machine learning models for traditional tabular data suboptimal for solving the underlying task. In this context, deep learning has emerged in recent years as one of the most effective methods for tackling the supervised classification task, particularly in the field of computer vision. The main objective of this thesis was to study and develop deep neural networks specifically constructed for the classification of time series data. We thus carried out the first large scale experimental study allowing us to compare the existing deep methods and to position them compared other non-deep learning based state-of-the-art methods. Subsequently, we made numerous contributions in this area, notably in the context of transfer learning, data augmentation, ensembling and adversarial attacks. Finally, we have also proposed a novel architecture, based on the famous Inception network (Google), which ranks among the most efficient to date.
Python Data Science with Pandas: Master 12 Advanced Projects
Online Courses Udemy - Python Data Science with Pandas: Master 12 Advanced Projects, Work with Pandas, SQL Databases, JSON, Web APIs & more to master your real-world Machine Learning & Finance Projects Bestseller Created by Alexander Hagmann English [Auto] Students also bought Machine Learning and AI: Support Vector Machines in Python Unsupervised Machine Learning Hidden Markov Models in Python Natural Language Processing with Deep Learning in Python Advanced AI: Deep Reinforcement Learning in Python Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Cutting-Edge AI: Deep Reinforcement Learning in Python Preview this course GET COUPON CODE Description Welcome to the first advanced and project-based Pandas Data Science Course! This Course starts where many other courses end: You can write some Pandas code but you are still struggling with real-world Projects because Real-World Data is typically not provided in a single or a few text/excel files - more advanced Data Importing Techniques are required Real-World Data is large, unstructured, nested and unclean - more advanced Data Manipulation and Data Analysis/Visualization Techniques are required many easy-to-use Pandas methods work best with relatively small and clean Datasets - real-world Datasets require more General Code (incorporating other Libraries/Modules) No matter if you need excellent Pandas skills for Data Analysis, Machine Learning or Finance purposes, this is the right Course for you to get your skills to Expert Level! This Course covers the full Data Workflow A-Z: Import (complex and nested) Data from JSON files. Efficiently import and merge Data from many text/CSV files. Clean, handle and flatten nested and stringified Data in DataFrames.