Markov Models
Temporal Parallelization of Inference in Hidden Markov Models
Hassan, Sakira, Sรคrkkรค, Simo, Garcรญa-Fernรกndez, รngel F.
This paper presents algorithms for parallelization of inference in hidden Markov models (HMMs). In particular, we propose parallel backward-forward type of filtering and smoothing algorithm as well as parallel Viterbi-type maximum-a-posteriori (MAP) algorithm. We define associative elements and operators to pose these inference problems as parallel-prefix-sum computations in sum-product and max-product algorithms and parallelize them using parallel-scan algorithms. The advantage of the proposed algorithms is that they are computationally efficient in HMM inference problems with long time horizons. We empirically compare the performance of the proposed methods to classical methods on a highly parallel graphical processing unit (GPU).
Risk-Averse Bayes-Adaptive Reinforcement Learning
Rigter, Marc, Lacerda, Bruno, Hawes, Nick
In this work, we address risk-averse Bayesadaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach
We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret $\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{1/3}T^{2/3}\})$ where $T$ is the number of rounds and $L$ and $\Delta$ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of $L$ and $\Delta$.
Patterns, predictions, and actions: A story about machine learning
Hardt, Moritz, Recht, Benjamin
This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions. Starting with the foundations of decision making, we cover representation, optimization, and generalization as the constituents of supervised learning. A chapter on datasets as benchmarks examines their histories and scientific bases. Self-contained introductions to causality, the practice of causal inference, sequential decision making, and reinforcement learning equip the reader with concepts and tools to reason about actions and their consequences. Throughout, the text discusses historical context and societal impact. We invite readers from all backgrounds; some experience with probability, calculus, and linear algebra suffices.
Uncertainty Quantification and Propagation for Airline Disruption Management
Ogunsina, Kolawole, Papamichalis, Marios, DeLaurentis, Daniel
Disruption management during the airline scheduling process can be compartmentalized into proactive and reactive processes depending upon the time of schedule execution. The state of the art for decision-making in airline disruption management involves a heuristic human-centric approach that does not categorically study uncertainty in proactive and reactive processes for managing airline schedule disruptions. Hence, this paper introduces an uncertainty transfer function model (UTFM) framework that characterizes uncertainty for proactive airline disruption management before schedule execution, reactive airline disruption management during schedule execution, and proactive airline disruption management after schedule execution to enable the construction of quantitative tools that can allow an intelligent agent to rationalize complex interactions and procedures for robust airline disruption management. Specifically, we use historical scheduling and operations data from a major U.S. airline to facilitate the development and assessment of the UTFM, defined by hidden Markov models (a special class of probabilistic graphical models) that can efficiently perform pattern learning and inference on portions of large data sets.
Neurogenetic Programming Framework for Explainable Reinforcement Learning
Liventsev, Vadim, Hรคrmรค, Aki, Petkoviฤ, Milan
Automatic programming, the task of generating computer programs compliant with a specification without a human developer, is usually tackled either via genetic programming methods based on mutation and recombination of programs, or via neural language models. We propose a novel method that combines both approaches using a concept of a virtual neuro-genetic programmer: using evolutionary methods as an alternative to gradient descent for neural network training}, or scrum team. We demonstrate its ability to provide performant and explainable solutions for various OpenAI Gym tasks, as well as inject expert knowledge into the otherwise data-driven search for solutions.
Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning
Lyu, Xueguang, Xiao, Yuchen, Daley, Brett, Amato, Christopher
Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. However, the implications of using a centralized critic in this context are not fully discussed and understood even though it is the standard choice of many algorithms. We therefore formally analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice. Because our theory makes unrealistic assumptions, we also empirically compare the centralized and decentralized critic methods over a wide set of environments to validate our theories and to provide practical advice. We show that there exist misconceptions regarding centralized critics in the current literature and show that the centralized critic design is not strictly beneficial, but rather both centralized and decentralized critics have different pros and cons that should be taken into account by algorithm designers.
Determinantal consensus clustering
Vicente, Serge, Murua, Alejandro
Random restart of a given algorithm produces many partitions to yield a consensus clustering. Ensemble methods such as consensus clustering have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPP and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points within subsets. So, subsets with more similar points have less chances of being generated than subsets with very distinct points. The current and most popular sampling technique is sampling center points uniformly at random. We show through extensive simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. These two properties of DPP are key to make DPPs achieve good performance with small ensembles. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperform classical algorithms such as k-medoids and k-means consensus clusterings which are based on uniform random sampling of center points.
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers
Hu, Siyi, Zhu, Fengda, Chang, Xiaojun, Liang, Xiaodan
Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task. The limitation is due to the restricted model architecture related to fixed input and output dimensions. This hinders the experience accumulation and transfer of the learned agent over tasks with diverse levels of difficulty (e.g. 3 vs 3 or 5 vs 6 multi-agent games). In this paper, we make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks with the requirement of different observation and action configurations. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy by decoupling the policy distribution from the intertwined input observation with an importance weight measured by the merits of the self-attention mechanism. Compared to a standard transformer block, the proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable. UPDeT is general enough to be plugged into any multi-agent reinforcement learning pipeline and equip them with strong generalization abilities that enables the handling of multiple tasks at a time. Extensive experiments on large-scale SMAC multi-agent competitive games demonstrate that the proposed UPDeT-based multi-agent reinforcement learning achieves significant results relative to state-of-the-art approaches, demonstrating advantageous transfer capability in terms of both performance and training speed (10 times faster).
Scalable Inference of Sparsely-changing Markov Random Fields with Strong Statistical Guarantees
In this paper, we study the problem of inferring time-varying Markov random fields (MRF), where the underlying graphical model is both sparse and changes sparsely over time. Most of the existing methods for the inference of time-varying MRFs rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we introduce a new class of constrained optimization problems for the inference of sparsely-changing MRFs. The proposed optimization problem is formulated based on the exact $\ell_0$ regularization, and can be solved in near-linear time and memory. Moreover, we show that the proposed estimator enjoys a provably small estimation error. As a special case, we derive sharp statistical guarantees for the inference of sparsely-changing Gaussian MRFs (GMRF) in the high-dimensional regime, showing that such problems can be learned with as few as one sample per time. Our proposed method is extremely efficient in practice: it can accurately estimate sparsely-changing graphical models with more than 500 million variables in less than one hour.