Markov Models
Unsupervised Machine Learning Hidden Markov Models in Python
The Hidden Markov Model or HMM is all about learning sequences. A lot of the data that would be very useful for us to model is in sequences. Stock prices are sequences of prices. Language is a sequence of words. Credit scoring involves sequences of borrowing and repaying money, and we can use those sequences to predict whether or not you're going to default.
Deep Learning: Recurrent Neural Networks in Python
Deep Learning: Recurrent Neural Networks in Python, GRU, LSTM, more modern deep learning, machine learning, and data science for sequences Created by Lazy Programmer Inc. English [Auto], Indonesian [Auto], 5 more Preview this Course - GET COUPON CODE Description Like the course I just released on Hidden Markov Models, Recurrent Neural Networks are all about learning sequences - but whereas Markov Models are limited by the Markov assumption, Recurrent Neural Networks are not - and as a result, they are more expressive, and more powerful than anything we've seen on tasks that we haven't made progress on in decades. So what's going to be in this course and how will it build on the previous neural network courses and Hidden Markov Models? In the first section of the course we are going to add the concept of time to our neural networks. I'll introduce you to the Simple Recurrent Unit, also known as the Elman unit. We are going to revisit the XOR problem, but we're going to extend it so that it becomes the parity problem - you'll see that regular feedforward neural networks will have trouble solving this problem but recurrent networks will work because the key is to treat the input as a sequence.
Asynchronous and Distributed Data Augmentation for Massive Data Settings
Zhou, Jiayuan, Khare, Kshitij, Srivastava, Sanvesh
Data augmentation (DA) algorithms are widely used for Bayesian inference due to their simplicity. In massive data settings, however, DA algorithms are prohibitively slow because they pass through the full data in any iteration, imposing serious restrictions on their usage despite the advantages. Addressing this problem, we develop a framework for extending any DA that exploits asynchronous and distributed computing. The extended DA algorithm is indexed by a parameter $r \in (0, 1)$ and is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA starts by dividing the full data into $k$ smaller disjoint subsets and storing them on $k$ processes, which could be machines or processors. Every iteration of ADDA augments only an $r$-fraction of the $k$ data subsets with some positive probability and leaves the remaining $(1-r)$-fraction of the augmented data unchanged. The parameter draws are obtained using the $r$-fraction of new and $(1-r)$-fraction of old augmented data. For many choices of $k$ and $r$, the fractional updates of ADDA lead to a significant speed-up over the parent DA in massive data settings, and it reduces to the distributed version of its parent DA when $r=1$. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate the numerical advantages of the ADDA in three representative examples corresponding to different kinds of massive data settings encountered in applications. In all these examples, our DA generalization is significantly faster than its parent DA algorithm for all the choices of $k$ and $r$. We also establish geometric ergodicity of the ADDA Markov chain for all three examples, which in turn yields asymptotically valid standard errors for estimates of desired posterior quantities.
Solving infinite-horizon Dec-POMDPs using Finite State Controllers within JESP
You, Yang, Thomas, Vincent, Colas, Francis, Buffet, Olivier
This paper looks at solving collaborative planning problems formalized as Decentralized POMDPs (Dec-POMDPs) by searching for Nash equilibria, i.e., situations where each agent's policy is a best response to the other agents' (fixed) policies. While the Joint Equilibrium-based Search for Policies (JESP) algorithm does this in the finite-horizon setting relying on policy trees, we propose here to adapt it to infinite-horizon Dec-POMDPs by using finite state controller (FSC) policy representations. In this article, we (1) explain how to turn a Dec-POMDP with $N-1$ fixed FSCs into an infinite-horizon POMDP whose solution is an $N^\text{th}$ agent best response; (2) propose a JESP variant, called \infJESP, using this to solve infinite-horizon Dec-POMDPs; (3) introduce heuristic initializations for JESP aiming at leading to good solutions; and (4) conduct experiments on state-of-the-art benchmark problems to evaluate our approach.
A Comprehensive Overview of Recommender System and Sentiment Analysis
AL-Ghuribi, Sumaia Mohammed, Noah, Shahrul Azman Mohd
Recommender system has been proven to be significantly crucial in many fields and is widely used by various domains. Most of the conventional recommender systems rely on the numeric rating given by a user to reflect his opinion about a consumed item; however, these ratings are not available in many domains. As a result, a new source of information represented by the user-generated reviews is incorporated in the recommendation process to compensate for the lack of these ratings. The reviews contain prosperous and numerous information related to the whole item or a specific feature that can be extracted using the sentiment analysis field. This paper gives a comprehensive overview to help researchers who aim to work with recommender system and sentiment analysis. It includes a background of the recommender system concept, including phases, approaches, and performance metrics used in recommender systems. Then, it discusses the sentiment analysis concept and highlights the main points in the sentiment analysis, including level, approaches, and focuses on aspect-based sentiment analysis.
Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback
Tahir, Anam, Alt, Bastian, Rizk, Amr, Koeppl, Heinz
Scheduling decisions in parallel queuing systems arise as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, and Big Data systems. In essence, the scheduler maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal scheduling decisions here is that the scheduler only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a near-optimal scheduling policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information scheduling strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the- Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
Interpretable Local Tree Surrogate Policies
Mern, John, Krishnan, Sidhart, Yildiz, Anil, Hatch, Kyle, Kochenderfer, Mykel J.
High-dimensional policies, such as those represented by neural networks, cannot be reasonably interpreted by humans. This lack of interpretability reduces the trust users have in policy behavior, limiting their use to low-impact tasks such as video games. Unfortunately, many methods rely on neural network representations for effective learning. In this work, we propose a method to build predictable policy trees as surrogates for policies such as neural networks. The policy trees are easily human interpretable and provide quantitative predictions of future behavior. We demonstrate the performance of this approach on several simulated tasks.
Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning
Rathnam, Sarah, Murphy, Susan A., Doshi-Velez, Finale
In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.
On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference
We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) as probabilistic inference. Furthermore, we discuss online and offline methods for planning under uncertainty. In an SSP MDP, the horizon is indefinite and unknown a priori. SSP MDPs generalize finite and infinite horizon MDPs and are widely used in the artificial intelligence community. Additionally, we highlight some of the differences between solving an MDP using dynamic programming approaches widely used in the artificial intelligence community and approaches used in the active inference community.
Evolutionary Reinforcement Learning Dynamics with Irreducible Environmental Uncertainty
Barfuss, Wolfram, Mann, Richard P.
In this work we derive and present evolutionary reinforcement learning dynamics in which the agents are irreducibly uncertain about the current state of the environment. We evaluate the dynamics across different classes of partially observable agent-environment systems and find that irreducible environmental uncertainty can lead to better learning outcomes faster, stabilize the learning process and overcome social dilemmas. However, as expected, we do also find that partial observability may cause worse learning outcomes, for example, in the form of a catastrophic limit cycle. Compared to fully observant agents, learning with irreducible environmental uncertainty often requires more exploration and less weight on future rewards to obtain the best learning outcomes. Furthermore, we find a range of dynamical effects induced by partial observability, e.g., a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions. The presented dynamics are a practical tool for researchers in biology, social science and machine learning to systematically investigate the evolutionary effects of environmental uncertainty.