Undirected Networks
Structured Black Box Variational Inference for Latent Time Series Models
Bamler, Robert, Mandt, Stephan
Continuous latent time series models are prevalent in Bayesian modeling; examples include the Kalman filter, dynamic collaborative filtering, or dynamic topic models. These models often benefit from structured, non mean field variational approximations that capture correlations between time steps. Black box variational inference with reparameterization gradients (BBVI) allows us to explore a rich new class of Bayesian non-conjugate latent time series models; however, a naive application of BBVI to such structured variational models would scale quadratically in the number of time steps. We describe a BBVI algorithm analogous to the forward-backward algorithm which instead scales linearly in time. It allows us to efficiently sample from the variational distribution and estimate the gradients of the ELBO. Finally, we show results on the recently proposed dynamic word embedding model, which was trained using our method.
Survey on Models and Techniques for Root-Cause Analysis
Solรฉ, Marc, Muntรฉs-Mulero, Victor, Rana, Annie Ibrahim, Estrada, Giovani
Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how to find appropriate mechanisms to determine root causes that can handle huge amounts of data or may provide valuable feedback in real-time. While many survey papers aim at summarizing the landscape of techniques for modelling system behavior and infering the root cause of a problem based in the resulting models, none of those focuses on analyzing how the different techniques in the literature fit growing requirements in terms of performance and scalability. In this survey, we provide a review of root-cause analysis, focusing on these particular aspects. We also provide guidance to choose the best root-cause analysis strategy depending on the requirements of a particular system and application.
Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
Ahmed, Asrar, Varakantham, Pradeep, Lowalekar, Meghna, Adulyasak, Yossiri, Jaillet, Patrick
Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.
Energy-Based Sequence GANs for Recommendation and Their Connection to Imitation Learning
Yoo, Jaeyoon, Ha, Heonseok, Yi, Jihun, Ryu, Jongha, Kim, Chanju, Ha, Jung-Woo, Kim, Young-Han, Yoon, Sungroh
Recommender systems aim to find an accurate and efficient mapping from historic data of user-preferred items to a new item that is to be liked by a user. Towards this goal, energy-based sequence generative adversarial nets (EB-SeqGANs) are adopted for recommendation by learning a generative model for the time series of user-preferred items. By recasting the energy function as the feature function, the proposed EB-SeqGANs is interpreted as an instance of maximum-entropy imitation learning.
Population-Contrastive-Divergence: Does Consistency help with RBM training?
Krause, Oswin, Fischer, Asja, Igel, Christian
Estimating the log-likelihood gradient with respect to the parameters of a Restricted Boltzmann Machine (RBM) typically requires sampling using Markov Chain Monte Carlo (MCMC) techniques. To save computation time, the Markov chains are only run for a small number of steps, which leads to a biased estimate. This bias can cause RBM training algorithms such as Contrastive Divergence (CD) learning to deteriorate. We adopt the idea behind Population Monte Carlo (PMC) methods to devise a new RBM training algorithm termed Population-Contrastive-Divergence (pop-CD). Compared to CD, it leads to a consistent estimate and may have a significantly lower bias. Its computational overhead is negligible compared to CD. However, the variance of the gradient estimate increases. We experimentally show that pop-CD can significantly outperform CD. In many cases, we observed a smaller bias and achieved higher log-likelihood values. However, when the RBM distribution has many hidden neurons, the consistent estimate of pop-CD may still have a considerable bias and the variance of the gradient estimate requires a smaller learning rate. Thus, despite its superior theoretical properties, it is not advisable to use pop-CD in its current form on large problems.
Is deep learning a Markov chain in disguise?
Andrej Karpathy's post "The Unreasonable Effectiveness of Recurrent Neural Networks" made splashes last year. The basic premise is that you can create a recurrent neural network to learn language features character-by-character. But is the resultant model any different from a Markov chain built for the same purpose? I implemented a character-by-character Markov chain in R to find out. First, let's play a variation of the Imitation Game with generated text from Karpathy's tinyshakespeare dataset.
Open Source Toolkits for Speech Recognition
As members of the deep learning R&D team at SVDS, we are interested in comparing Recurrent Neural Network (RNN) and other approaches to speech recognition. Until a few years ago, the state-of-the-art for speech recognition was a phonetic-based approach including separate components for pronunciation, acoustic, and language models. Typically, this consists of n-gram language models combined with Hidden Markov models (HMM). We wanted to start with this as a baseline model, and then explore ways to combine it with newer approaches such as Baidu's Deep Speech. While summaries exist explaining these baseline phonetic models, there do not appear to be any easily-digestible blog posts or papers that compare the tradeoffs of the different freely available tools.
Temporal-related Convolutional-Restricted-Boltzmann-Machine capable of learning relational order via reinforcement learning procedure?
In this article, we extend the conventional framework of convolutional-Restricted-Boltzmann-Machine to learn highly abstract features among abitrary number of time related input maps by constructing a layer of multiplicative units, which capture the relations among inputs. In many cases, more than two maps are strongly related, so it is wise to make multiplicative unit learn relations among more input maps, in other words, to find the optimal relational-order of each unit. In order to enable our machine to learn relational order, we developed a reinforcement-learning method whose optimality is proven to train the network.
A Semi-Supervised Classification Algorithm using Markov Chain and Random Walk in R
In this article, a semi-supervised classification algorithm implementation will be described using Markov Chains and Random Walks. We have the following 2D circles dataset (with 1000 points) with only 2 points labeled (as shown in the figure, colored red and blue respectively, for all others the labels are unknown, indicated by the color black). Now the task is to predict the labels of the other (unlabeled) points. From each of the unlabeled points (Markov states) a random walk with Markov transition matrix (computed from the row-stochastic kernelized distance matrix) will be started that will end in one labeled state, which will be an absorbing state in the Markov Chain. This problem was discussed as an application of Markov Chain in a lecture from the edX course ColumbiaX: CSMM.102x
Ensembles of Models and Metrics for Robust Ranking of Homologous Proteins
Tomal, Jabed H, Welch, William J, Zamar, Ruben H
An ensemble of models (EM), where each model is constructed on a diverse subset of feature variables, is proposed to rank rare class items ahead of majority class items in a highly unbalanced two class problem. The proposed ensemble relies on an algorithm to group the feature variables into subsets where the variables in a subset work better together in a model and the variables in different subsets work better in separate models. The strength of the EM depends on the algorithm's ability to identify strong and diverse subsets of feature variables. A second phase of ensembling is achieved by aggregating several EMs each optimized on a diverse evaluation metric. The resulting ensemble is called ensemble of models and metrics (EMM). Here, the diverse/complementary evaluation metrics ensure increased diversity among EMs to aggregate. The ensembles are applied to the protein homology data, downloaded from the 2004 KDD cup competition website, to rank proteins in such a way that the rare homologous proteins are found ahead of the majority non-homologous proteins. The ensembles are constructed using feature variables which are various scores from sequence alignments of amino acids in a candidate protein and three dimensional descriptions of a native protein representing functional and structural similarity of proteins. While prediction performances of the EMs are better than the contemporary state-of-the-art ensembles and competitive to the winning procedures of the $2004$ KDD cup competition, the performances of the EMM are found on the top of all. In this application, we have two diverse EMs constructed on two complementary evaluation metrics average precision and rank last, where the former is robust against ranking close homologs and the latter is robust against ranking distant homologs. The advantage of using EMM is that it is robust against both close and distant homologs.