AITopics | Statistical Learning

This work introduces a novel nonparametric density index defined on graphs, the Sum-over-Forests (SoF) density index. It is based on a clear and intuitive idea: high-density regions in a graph are characterized by the fact that they contain a large amount of low-cost trees with high outdegrees while low-density regions contain few ones. Therefore, a Boltzmann probability distribution on the countable set of forests in the graph is defined so that large (high-cost) forests occur with a low probability while short (low-cost) forests occur with a high probability. Then, the SoF density index of a node is defined as the expected outdegree of this node in a non-trivial tree of the forest, thus providing a measure of density around that node. Following the matrix-forest theorem, and a statistical physics framework, it is shown that the SoF density index can be easily computed in closed form through a simple matrix inversion. Experiments on artificial and real data sets show that the proposed index performs well on finding dense regions, for graphs of various origins.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1301.0725

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

A New Geometric Approach to Latent Topic Modeling and Discovery

Ding, Weicong, Rohban, Mohammad H., Ishwar, Prakash, Saligrama, Venkatesh

arXiv.org Machine LearningJan-4-2013

ABSTRACT A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme-points of empirical cross-document wordfrequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets. Index Terms-- Topic modeling, nonnegative matrix factorization (NMF), extreme points, subspace clustering. 1. INTRODUCTION Topic modeling is a statistical tool for the automatic discovery and comprehension of latent thematic structure or topics, assumed to pervade a corpus of documents. Suppose that we have a corpus of M documents composed of words from a vocabulary of W distinct words indexed byw 1,...,W.

data mining, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/ICASSP.2013.6638729

1301.0858

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Data Science > Data Mining (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Lopez-Paz, David, Hernández-Lobato, José Miguel, Schölkopf, Bernhard

arXiv.org Machine LearningJan-1-2013

A new framework based on the theory of copulas is proposed to address semi-supervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a nonparametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1 Introduction When humans address a new learning problem, they often use knowledge acquired while learning different but related tasks in the past. For example, when learning a second language, people rely on grammar rules and word derivations from their mother tongue. This is called language transfer [19]. However, in machine learning, most of the traditional methods are not able to exploit similarities between different learning tasks.

artificial intelligence, copula, machine learning, (13 more...)

arXiv.org Machine Learning

1301.0142

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)

Add feedback

Machine Learning for Personalized Medicine: Predicting Primary Myocardial Infarction from Electronic Health Records

Weiss, Jeremy C. (University of Wisconsin-Madison) | Natarajan, Sriraam (Wake Forest University) | Peissig, Peggy L. (Marshfield Clinic Research Foundation) | McCarty, Catherine A. (Essentia Institute of Rural Health) | Page, David (University of Wisconsin-Madison)

AI MagazineDec-31-2012

Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices.

health safety security environment and social responsibility, Health & Medicine, primary myocardial infarction, (9 more...)

AI Magazine

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.79)

Add feedback

Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Glazer, Assaf, Lindenbaum, Michael, Markovitch, Shaul

Neural Information Processing SystemsDec-31-2012

We propose an efficient, generalized, nonparametric, statistical Kolmogorov-Smirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods.

artificial intelligence, ocsvm, upstream oil & gas, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.15)
North America > United States (0.14)

Genre: Research Report (0.69)

Industry: Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

Buesing, Lars, Macke, Jakob H., Sahani, Maneesh

Neural Information Processing SystemsDec-31-2012

Latent linear dynamical systems with generalised-linear observation models arise in a variety of applications, for example when modelling the spiking activity of populations of neurons. Here, we show how spectral learning methods for linear systems with Gaussian observations (usually called subspace identification in this context) can be extended to estimate the parameters of dynamical system models observed through non-Gaussian noise models. We use this approach to obtain estimates of parameters for a dynamical model of neural population data, where the observed spike-counts are Poisson-distributed with log-rates determined by the latent dynamical process, possibly driven by external inputs. We show that the extended system identification algorithm is consistent and accurately recovers the correct parameters on large simulated data sets with much smaller computational cost than approximate expectation-maximisation (EM) due to the non-iterative nature of subspace identification. Even on smaller data sets, it provides an effective initialization for EM, leading to more robust performance and faster convergence. These benefits are shown to extend to real neural data.

artificial data, health & medicine, upstream oil & gas, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Industry:

Energy > Oil & Gas > Upstream (0.54)
Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Machine Learning for Personalized Medicine: Predicting Primary Myocardial Infarction from Electronic Health Records

Weiss, Jeremy C. (University of Wisconsin-Madison) | Natarajan, Sriraam (Wake Forest University) | Peissig, Peggy L. (Marshfield Clinic Research Foundation) | McCarty, Catherine A. (Essentia Institute of Rural Health) | Page, David (University of Wisconsin-Madison)

AI MagazineDec-31-2012

Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices.

artificial intelligence, machine learning, natural language, (17 more...)

AI Magazine

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Greenland (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(10 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Fully Bayesian inference for neural models with negative-binomial spiking

Pillow, Jonathan W., Scott, James

Neural Information Processing SystemsDec-31-2012

Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latent-variable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Noh, Yung-kyun, Park, Frank, Lee, Daniel D.

Neural Information Processing SystemsDec-31-2012

We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in k-nearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria.

artificial intelligence, machine learning, nearest neighbor, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Transelliptical Graphical Models

Liu, Han, Han, Fang, Zhang, Cun-hui

Neural Information Processing SystemsDec-31-2012

We advocate the use of a new distribution family--the transelliptical--for robust inference of high dimensional graphical models. The transelliptical family is an extension of the nonparanormal family proposed by Liu et al. (2009). Just as the nonparanormal extends the normal by transforming the variables using univariate functions, the transelliptical extends the elliptical family in the same way. We propose a nonparametric rank-based regularization estimator which achieves the parametric rates of convergence for both graph recovery and parameter estimation. Such a result suggests that the extra robustness and flexibility obtained by the semiparametric transelliptical modeling incurs almost no efficiency loss. We also discuss the relationship between this work with the transelliptical component analysis proposed by Han and Liu (2012).

estimation, graphical model, kendall, (13 more...)

Neural Information Processing Systems

Country: