Goto

Collaborating Authors

 Directed Networks


Learning Sparse Structural Changes in High-dimensional Markov Networks: A Review on Methodologies and Theories

arXiv.org Machine Learning

For example, genes may regulate each other in different ways when external conditions are changed; the number of daily flu-like symptom reports in nearby hospitals may become correlated when a major epidemic disease breaks out; EEG signals from different regions of the brain may be synchronized/desynchronized when the subject is performing different activities. Spotting such changes in interactions may provide key insights into the underlying system. The interactions among random variables can be formulated as undirected probabilistic graphical models, or Markov Networks (MNs) [Koller and Friedman, 2009], expressing the interactions via the conditional independence. We consider a simple model: the pairwise MNs where the links are only encoded for single or pairs of random variables. Due to the Hammersley-Clifford theorem [Hammersley and Clifford, 1971], the underlying joint probability density function can be represented as the product of univariate and bivariate factors.


Bayesian model selection consistency and oracle inequality with intractable marginal likelihood

arXiv.org Machine Learning

In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Bayesian model selection procedure to be consistent and obey the Occam's razor phenomenon, i.e., the probability of selecting the "smallest" model that contains the truth tends to one as the sample size goes to infinity. In order to show that a Bayesian model selection procedure selects the smallest model containing the truth, we impose a prior anti-concentration condition, requiring the prior mass assigned by large models to a neighborhood of the truth to be sufficiently small. In a more general setting where the strong model identifiability assumption may not hold, we introduce the notion of local Bayesian complexity and develop oracle inequalities for Bayesian model selection procedures. Our Bayesian oracle inequality characterizes a trade-off between the approximation error and a Bayesian characterization of the local complexity of the model, illustrating the adaptive nature of averaging-based Bayesian procedures towards achieving an optimal rate of posterior convergence. Specific applications of the model selection theory are discussed in the context of high-dimensional nonparametric regression and density regression where the regression function or the conditional density is assumed to depend on a fixed subset of predictors. As a result of independent interest, we propose a general technique for obtaining upper bounds of certain small ball probability of stationary Gaussian processes.


Information Pursuit: A Bayesian Framework for Sequential Scene Parsing

arXiv.org Machine Learning

Despite enormous progress in object detection and classification, the problem of incorporating expected contextual relationships among object instances into modern recognition systems remains a key challenge. In this work we propose Information Pursuit, a Bayesian framework for scene parsing that combines prior models for the geometry of the scene and the spatial arrangement of objects instances with a data model for the output of high-level image classifiers trained to answer specific questions about the scene. In the proposed framework, the scene interpretation is progressively refined as evidence accumulates from the answers to a sequence of questions. At each step, we choose the question to maximize the mutual information between the new answer and the full interpretation given the current evidence obtained from previous inquiries. We also propose a method for learning the parameters of the model from synthesized, annotated scenes obtained by top-down sampling from an easy-to-learn generative scene model. Finally, we introduce a database of annotated indoor scenes of dining room tables, which we use to evaluate the proposed approach.


Variational Bayesian Inference of Line Spectra

arXiv.org Machine Learning

In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid; and the coefficients are governed by a Bernoulli-Gaussian prior model turning model order selection into binary sequence detection. Unlike earlier works which retain only point estimates of the frequencies, we undertake a more complete Bayesian treatment by estimating the posterior probability density functions (pdfs) of the frequencies and computing expectations over them. Thus, we additionally capture and operate with the uncertainty of the frequency estimates. Aiming to maximize the model evidence, variational optimization provides analytic approximations of the posterior pdfs and also gives estimates of the additional parameters. We propose an accurate representation of the pdfs of the frequencies by mixtures of von Mises pdfs, which yields closed-form expectations. We define the algorithm VALSE in which the estimates of the pdfs and parameters are iteratively updated. VALSE is a gridless, convergent method, does not require parameter tuning, can easily include prior knowledge about the frequencies and provides approximate posterior pdfs based on which the uncertainty in line spectral estimation can be quantified. Simulation results show that accounting for the uncertainty of frequency estimates, rather than computing just point estimates, significantly improves the performance. The performance of VALSE is superior to that of state-of-the-art methods and closely approaches the Cram\'er-Rao bound computed for the true model order.


Correlation vs. causation

@machinelearnbot

David Freedman is the author of an excellent book: "Statistical Models: Theory and Practice" which discusses the issue of causation. It's a very unique stat book in that it really gets into the issue of model assumptions. It claims to be introductory but I believe that a semester or two of math stat as a pre-req would be helpful. In the time series context, you can run a VAR and then do tests for Granger Causality to see if one variable is really "causing" the other where "causing" is defined by Granger. R has a nice package called vars which makes building VAR models and doing testing extremely straightforward.


Data Programming: Creating Large Training Sets, Quickly

arXiv.org Artificial Intelligence

Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. We show that by explicitly representing this training set labeling process as a generative model, we can "denoise" the generated training set, and establish theoretically that we can recover the parameters of these generative models in a handful of settings. We then show how to modify a discriminative loss function to make it noise-aware, and demonstrate our method over a range of discriminative models including logistic regression and LSTMs. Experimentally, on the 2014 TAC-KBP Slot Filling challenge, we show that data programming would have led to a new winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a state-of-the-art LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.


The Perceptron Algorithm explained with Python code

@machinelearnbot

Most tasks in Machine Learning can be reduced to classification tasks. For example, we have a medical dataset and we want to classify who has diabetes (positive class) and who doesn't (negative class). We have a dataset from the financial world and want to know which customers will default on their credit (positive class) and which customers will not (negative class). To do this, we can train a Classifier with a'training dataset' and after such a Classifier is trained (we have determined its model parameters) and can accurately classify the training set, we can use it to classify new data (test set). If the training is done properly, the Classifier should predict the class probabilities of the new data with a similar accuracy.


Machine learning and what it means for marketing

#artificialintelligence

Machine learning has a high profile currently and is riding a wave of exposure in the media that includes articles about subjects from self-driving cars and self-landing rockets, to computers beating the world's best players at Go, the most computationally complex board game in the world. Is there an opportunity for your organisation, and the marketers within it, to make use of this "new" technology? Machine learning techniques were developed as long ago as the 1950s, but with the advent of big data and large analytical engines, the prevalence and the ease of applying the techniques has increased. Additionally, organisations now understand the value that analytics can bring, so are willing to place it front and center in their plans and invest more time and resources in exploring new and better techniques. Segmentation and predictive models, for instance, have proven themselves time and again in the marketing world, but to a certain extent, they require a higher degree of knowledge to understand. In some cases, a machine learning technique unburdens the user of the statistical work, but provides just as good an answer as a traditional technique.


NLP: Classification using a Naive Bayes classifier

#artificialintelligence

Here is possible to find the application of the Naive Bayes approach to a specific problem: the classification of SMS into spam ("an undesired messages, e.g. The supporting code can be found here. The data used for such playground activity is the SMS Spam Collection v. 1, a public set of SMS messages that have been collected for mobile phone spam research where each message has been properly labeled as spam or ham. 'In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An example would be assigning a given email into "spam" or "non-spam" classes or assigning a diagnosis to a given patient as described by observed characteristics of the patient (gender, blood pressure, presence or absence of certain symptoms, etc.).


Graph Structure Learning from Unlabeled Data for Event Detection

arXiv.org Machine Learning

Processes such as disease propagation and information diffusion often spread over some latent network structure which must be learned from observation. Given a set of unlabeled training examples representing occurrences of an event type of interest (e.g., a disease outbreak), our goal is to learn a graph structure that can be used to accurately detect future events of that type. Motivated by new theoretical results on the consistency of constrained and unconstrained subset scans, we propose a novel framework for learning graph structure from unlabeled data by comparing the most anomalous subsets detected with and without the graph constraints. Our framework uses the mean normalized log-likelihood ratio score to measure the quality of a graph structure, and efficiently searches for the highest-scoring graph structure. Using simulated disease outbreaks injected into real-world Emergency Department data from Allegheny County, we show that our method learns a structure similar to the true underlying graph, but enables faster and more accurate detection.