Undirected Networks
Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning
Leibfried, Felix, Grau-Moya, Jordi
Cumulative entropy regularization introduces a regulatory signal to the reinforcement learning (RL) problem that encourages policies with high-entropy actions, which is equivalent to enforcing small deviations from a uniform reference marginal policy. This has been shown to improve exploration and robustness, and it tackles the value overestimation problem. It also leads to a significant performance increase in tabular and high-dimensional settings, as demonstrated via algorithms such as soft Q-learning (SQL) and soft actor-critic (SAC). Cumulative entropy regularization has been extended to optimize over the reference marginal policy instead of keeping it fixed, yielding a regularization that minimizes the mutual information between states and actions. While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators. Here, we follow the motivation of mutual-information regularization from an inference perspective and theoretically analyze the corresponding Bellman operator. Inspired by this Bellman operator, we devise a novel mutual-information regularized actor-critic learning (MIRACLE) algorithm for continuous action spaces that optimizes over the reference marginal policy. We empirically validate MIRACLE in the Mujoco robotics simulator, where we demonstrate that it can compete with contemporary RL methods. Most notably, it can improve over the model-free state-of-the-art SAC algorithm which implicitly assumes a fixed reference policy.
Interactive Fiction Games: A Colossal Adventure
Hausknecht, Matthew, Ammanabrolu, Prithviraj, Côté, Marc-Alexandre, Yuan, Xingdi
A hallmark of human intelligence is the ability to understand and communicate with language. Interactive Fiction games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. We argue that IF games are an excellent testbed for studying language-based autonomous agents. In particular, IF games combine challenges of combinatorial action spaces, language understanding, and commonsense reasoning. To facilitate rapid development of language-based agents, we introduce Jericho, a learning environment for man-made IF games and conduct a comprehensive study of text-agents across a rich set of games, highlighting directions in which agents can improve.
Correlation Priors for Reinforcement Learning
Alt, Bastian, Šošić, Adrian, Koeppl, Heinz
Many decision-making problems naturally exhibit pronounced structures inherited from the underlying characteristics of the environment. In a Markov decision process model, for example, two distinct states can have inherently related semantics or encode resembling physical state configurations, often implying locally correlated transition dynamics among the states. In order to complete a certain task, an agent acting in such environments needs to execute a series of temporally and spatially correlated actions. Though there exists a variety of approaches to account for correlations in continuous state-action domains, a principled solution for discrete environments is missing. In this work, we present a Bayesian learning framework based on P\'olya-Gamma augmentation that enables an analogous reasoning in such cases. We demonstrate the framework on a number of common decision-making related tasks, such as reinforcement learning, imitation learning and system identification. By explicitly modeling the underlying correlation structures, the proposed approach yields superior predictive performance compared to correlation-agnostic models, even when trained on data sets that are up to an order of magnitude smaller in size.
Conditional Random Fields Explained
Conditional Random Fields is a class of discriminative models best suited to prediction tasks where contextual information or state of the neighbors affect the current prediction. CRFs find their applications in named entity recognition, part of speech tagging, gene prediction, noise reduction and object detection problems, to name a few. In this article, I will first introduce the basic math and jargon related to Markov Random Fields which is an abstraction CRF is built upon. I will then introduce and explain a simple Conditional Random Fields model in detail which will show why are they suited well to sequential prediction problems. After that, I will go over the likelihood maximization problem and related derivations in context of that CRF model.
Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations
Pandey, Venktesh, Wang, Evana, Boyles, Stephen D.
This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers' value of time, origin, and destination. This framework relaxes assumptions in the literature by considering multiple origins and destinations, multiple access locations to the managed lane, en route diversion of travelers, partial observability of the sensor readings, and stochastic demand and observations. The problem is formulated as a partially observable Markov decision process (POMDP) and policy gradient methods are used to determine tolls as a function of real-time observations. Tolls are modeled as continuous and stochastic variables, and are determined using a feedforward neural network. The method is compared against a feedback control method used for dynamic pricing. We show that Deep-RL is effective in learning toll policies for maximizing revenue, minimizing total system travel time, and other joint weighted objectives, when tested on real-world transportation networks. The Deep-RL toll policies outperform the feedback control heuristic for the revenue maximization objective by generating revenues up to 9.5% higher than the heuristic and for the objective minimizing total system travel time (TSTT) by generating TSTT up to 10.4% lower than the heuristic. We also propose reward shaping methods for the POMDP to overcome the undesired behavior of toll policies, like the jam-and-harvest behavior of revenue-maximizing policies. Additionally, we test transferability of the algorithm trained on one set of inputs for new input distributions and offer recommendations on real-time implementations of Deep-RL algorithms. The source code for our experiments is available online at https://github.com/venktesh22/ExpressLanes_Deep-RL
Machine Learning for Stochastic Parameterization: Generative Adversarial Networks in the Lorenz '96 Model
Gagne, David John II, Christensen, Hannah M., Subramanian, Aneesh C., Monahan, Adam H.
Stochastic parameterizations account for uncertainty in the representation of unresolved sub-grid processes by sampling from the distribution of possible sub-grid forcings. Some existing stochastic parameterizations utilize data-driven approaches to characterize uncertainty, but these approaches require significant structural assumptions that can limit their scalability. Machine learning models, including neural networks, are able to represent a wide range of distributions and build optimized mappings between a large number of inputs and sub-grid forcings. Recent research on machine learning parameterizations has focused only on deterministic parameterizations. In this study, we develop a stochastic parameterization using the generative adversarial network (GAN) machine learning framework. The GAN stochastic parameterization is trained and evaluated on output from the Lorenz '96 model, which is a common baseline model for evaluating both parameterization and data assimilation techniques. We evaluate different ways of characterizing the input noise for the model and perform model runs with the GAN parameterization at weather and climate timescales. Some of the GAN configurations perform better than a baseline bespoke parameterization at both timescales, and the networks closely reproduce the spatio-temporal correlations and regimes of the Lorenz '96 system. We also find that in general those models which produce skillful forecasts are also associated with the best climate simulations.
Q-Learning Based Aerial Base Station Placement for Fairness Enhancement in Mobile Networks
Ghanavi, Rozhina, Sabbaghian, Maryam, Yanikomeroglu, Halim
In this paper, we use an aerial base station (aerial-BS) to enhance fairness in a dynamic environment with user mobility. The problem of optimally placing the aerial-BS is a non-deterministic polynomial-time hard (NP-hard) problem. Moreover, the network topology is subject to continuous changes due to the user mobility. These issues intensify the quest to develop an adaptive and fast algorithm for 3D placement of the aerial-BS. To this end, we propose a method based on reinforcement learning to achieve these goals. Simulation results show that our method increases fairness among users in a reasonable computing time, while the solution is comparatively close to the optimal solution obtained by exhaustive search.
Static Analysis for Probabilistic Programs
Probabilistic programming is a powerful abstraction for statistical machine learning. Applying static analysis methods to probabilistic programs could serve to optimize the learning process, automatically verify properties of models, and improve the programming interface for users. This field of static analysis for probabilistic programming (SAPP) is young and unorganized, consisting of a constellation of techniques with various goals and limitations. The primary aim of this work is to synthesize the major contributions of the SAPP field within an organizing structure and context. We provide technical background for static analysis and probabilistic programming, suggest a functional taxonomy for probabilistic programming languages, and analyze the applicability of major ideas in the SAPP field. We conclude that, while current static analysis techniques for probabilistic programs have practical limitations, there are a number of future directions with high potential to improve the state of statistical machine learning.
Boltzmann machine learning and regularization methods for inferring evolutionary fields and couplings from a multiple sequence alignment
The inverse Potts problem to infer the Boltzmann distribution for homologous protein sequences from their single-site and pairwise frequencies recently attracts a great deal of attention due to its capacity to accurately predict residue-residue contacts in a 3D protein complex. A Boltzmann machine for the accurate estimation of the field and coupling interactions, which is required for other studies in protein evolution and folding, is studied about learning methods, regularization models and a tuning method of regularization parameters in order to infer the interactions with reasonable characteristics. Using $L_2$ regularization for fields, group $L_1$ for couplings is shown to be very effective for parse couplings in comparison with $L_2$ and with $L_1$. Two regularization parameters for fields and couplings are tuned to yield equal values for both the sample average and the ensemble average of evolutionary energies of natural proteins. Both the averages along a learning process smoothly change and converge, but their profiles are very different between the learning methods. Most per-parameter adaptive learning methods invented for machine learning cannot learn reasonable parameters for sparse-interaction systems. A modified Adam (ModAdam) method is invented to make step-size proportional to the partial derivative for sparse couplings and to use a soft thresholding function for group $L_1$. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered by the group $L_1$ and the ModAdam method. However, the distribution of evolutionary energies over natural proteins is shifted towards lower energies from that of Monte Carlo samples, indicating that there may be higher-order interactions to favor natural sequences.
Inverse Ising inference from high-temperature re-weighting of observations
Jo, Junghyo, Hoang, Danh-Tai, Periwal, Vipul
Maximum Likelihood Estimation (MLE) is the bread and butter of system inference for stochastic systems. In some generality, MLE will converge to the correct model in the infinite data limit. In the context of physical approaches to system inference, such as Boltzmann machines, MLE requires the arduous computation of partition functions summing over all configurations, both observed and unobserved. We present here a conceptually and computationally transparent data-driven approach to system inference that is based on the simple question: How should the Boltzmann weights of observed configurations be modified to make the probability distribution of observed configurations close to a flat distribution? This algorithm gives accurate inference by using only observed configurations for systems with a large number of degrees of freedom where other approaches are intractable.