Learning Graphical Models
Markov Switching Regimes say... bear or bullish? - Quantdare
We are going to introduce the Markov Switching Regimes (MSR) model which, as its name indicates, tries to capture when a regimen has changed to another one. This would be a change between opposite trends or it could consist in passing from "being in trend" to "not being in trend" and vice versa. The name of Markov could sound familiar to some of you as j3 introduced what the Markov chains were a couple of years ago. The main characteristic of this stochastic process is that in a stage t, the probability of occurrence only depends on what happened in the immediately previous stage, t-1. In our post we will assume that the trend of an index today will depend only on which trend was living yesterday, this means, the index will be governed by a Markov chain.
Graph based manifold regularized deep neural networks for automatic speech recognition
Tomar, Vikrant Singh, Rose, Richard C.
ABSTRACT Deep neural networks (DNNs) have been successfully applied to a wide variety of acoustic modeling tasks in recent years. These include the applications of DNNs either in a discriminative feature extraction or in a hybrid acoustic modeling scenario. Despite the rapid progress in this area, a number of challenges remain in training DNNs. This paper presents an effective way of training DNNs using a manifold learning based regularization framework. In this framework, the parameters of the network are optimized to preserve underlying manifold based relationships between speech feature vectors while minimizing a measure of loss between network outputs and targets. This is achieved by incorporating manifold based locality constraints in the objective criterion of DNNs. Empirical evidence is provided to demonstrate that training a network with manifold constraints preserves structural compactness in the hidden layers of the network. Manifold regularization is applied to train bottleneck DNNs for feature extraction in hidden Markov model (HMM) based speech recognition. The experiments in this work are conducted on the Aurora-2 spoken digits and the Aurora-4 read news large vocabulary continuous speech recognition tasks. The performance is measured in terms of word error rate (WER) on these tasks. It is shown that the manifold regularized DNNs result in up to 37% reduction in WER relative to standard DNNs. Index Terms-- manifold learning, deep neural networks, manifold regularization, manifold regularized deep neural networks, speech recognition 1. INTRODUCTION Recently there has been a resurgence of research in the area of deep neural networks (DNNs) for acoustic modeling in automatic speech recognition (ASR) [1-6]. Much of this research has been concentrated on techniques for regularization of the algorithms used for DNN parameter estimation [7-9]. At the same time, there has also been a great deal of research on graph based techniques that facilitate the preservation of local neighborhood relationships among feature vectors for parameter estimation in a number of application areas [10-13]. Algorithms that preserve these local relationships are often referred to as having the effect of applying manifold based constraints.
Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation
Srivastava, Akash, Zou, James, Adams, Ryan P., Sutton, Charles
A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when they see one. We present a new approach to interactive clustering for data exploration called TINDER, based on a particularly simple feedback mechanism, in which an analyst can reject a given clustering and request a new one, which is chosen to be different from the previous clustering while fitting the data well. We formalize this interaction in a Bayesian framework as a method for prior elicitation, in which each different clustering is produced by a prior distribution that is modified to discourage previously rejected clusterings. We show that TINDER successfully produces a diverse set of clusterings, each of equivalent quality, that are much more diverse than would be obtained by randomized restarts.
Discovery and Visualization of Nonstationary Causal Models
Zhang, Kun, Huang, Biwei, Zhang, Jiji, Schölkopf, Bernhard, Glymour, Clark
It is commonplace to encounter nonstationary data, of which the underlying generating process may change over time or across domains. The nonstationarity presents both challenges and opportunities for causal discovery. In this paper we propose a principled framework to handle nonstationarity, and develop some methods to address three important questions. First, we propose an enhanced constraint-based method to detect variables whose local mechanisms are nonstationary and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine some causal directions by taking advantage of information carried by changing distributions. Third, we develop a method for visualizing the nonstationarity of causal modules. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.
Principled Approaches for Learning Latent Variable Models
In any learning task, it is natural to incorporate latent or hidden variables which are not directly observed. For instance, in a social network, we can observe interactions among the actors, but not their hidden interests/intents, in gene networks, we can measure gene expression levels but not the detailed regulatory mechanisms, and so on. I will present a broad framework for unsupervised learning of latent variable models, addressing both statistical and computational concerns. We show that higher order relationships among observed variables have a low rank representation under natural statistical constraints such as conditional-independence relationships. These findings have implications in a number of settings such as finding hidden communities in networks, discovering topics in text documents and learning about gene regulation in computational biology.
Expectation Maximization Algorithm
Goal In today's summary we have a look at the expectation maximization algorithm that allows to optimize latent variable models when analytic inference of the posterior probability of latent variables is intractable. Motivation Latent variable models are itself interesting, because they are related to variational autoencoders and encoder-decoder frameworks that are popular in unsupervised and semi-supervised learning. They allow to sample from the data distribution and are believed to enhance the expressiveness of the hierarchical recurrent encoder decoder models. We can think of them as memorizing higher abstract information, such as emotional states that allow to generate sentimental utterances in the encoder. Steps In general we are concerned with finding good models, which means determining parameters of this model that can explain the data.
What are the Best Machine Learning Packages in R? R-bloggers
The most common question asked by prospective data scientists is – "What is the best programming language for Machine Learning?" The answer to this question always results in a debate whether to choose R, Python or MATLAB for Machine Learning. Nobody can, in reality, answer the question as to whether Python or R is best language for Machine Learning. However, the programming language one should choose for machine learning directly depends on the requirements of a given data problem, the likes and preferences of the data scientist and the context of machine learning activities they want to perform. According to a survey on Kaggler's Favourite Tools, the open source R programming language turned out to be the favourite among 543 Kagglers of the 1714 Kaggler's listing their data science tools.
Using Machine Learning to Name Malware
The current situation with malware naming conventions is in disarray. Different antivirus vendors use different naming conventions and sometimes they don't follow their own standards. Let's look at a few results for a random virus. These are the results from VirusTotal, a meta-antivirus scanning service. We can see that it is a Trojan malware with some vendors (Dr.Web and TrendMicro) setting the family as StartPage, some saying it's in the Agent family, some saying it is in the FakeAV family and some saying it is Generic "KR" malware.
Unsupervised Risk Estimation Using Only Conditional Independence Structure
Steinhardt, Jacob, Liang, Percy
We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test. We do not need to assume that the optimal predictor is the same between train and test, or that the true distribution lies in any parametric family. We can also efficiently differentiate the error estimate to perform unsupervised discriminative learning. Our technical tool is the method of moments, which allows us to exploit conditional independencies in the absence of a fully-specified model. Our framework encompasses a large family of losses including the log and exponential loss, and extends to structured output settings such as hidden Markov models.