AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Bayesian Compressed Regression

arXiv.org Machine LearningMar-22-2013

As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the predictors can be projected to a low dimensional linear subspace with minimal loss of information about the response. As opposed to existing Bayesian dimensionality reduction approaches, the exact posterior distribution conditional on the compressed data is available analytically, speeding up computation by many orders of magnitude while also bypassing robustness issues due to convergence and mixing problems with MCMC. Model averaging is used to reduce sensitivity to the random projection matrix, while accommodating uncertainty in the subspace dimension. Strong theoretical support is provided for the approach by showing near parametric convergence rates for the predictive density in the large p small n asymptotic paradigm. Practical performance relative to competitors is illustrated in simulations and real data applications.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Machine Learning

1303.0642

Genre: Research Report (1.00)

Industry: Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Deep Gaussian Processes

Damianou, Andreas C., Lawrence, Neil D.

arXiv.org Machine LearningMar-22-2013

In this paper we introduce deep Gaussian process (GP) models. Deep GPs are a deep belief network based on Gaussian process mappings. The data is modeled as the output of a multivariate GP. The inputs to that Gaussian process are then governed by another GP. A single layer model is equivalent to a standard GP or the GP latent variable model (GP-LVM). We perform inference in the model by approximate variational marginalization. This results in a strict lower bound on the marginal likelihood of the model which we use for model selection (number of layers and nodes per layer). Deep belief networks are typically applied to relatively large data sets using stochastic gradient descent for optimization. Our fully Bayesian treatment allows for the application of deep models even when data is scarce. Model selection by our variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

1211.0358

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Dynamic Microcluster Chains in Microtext

Robinson, Jason R. (The MITRE Corporation) | Condon, Sherri Lee (The MITRE Corporation)

AAAI ConferencesMar-21-2013

Two features of microtext that challenge language processing tools are addressed in the context of linking messages in the emergency response domain. First, the effect of very short texts on several classifiers is estimated by comparing the results when classifiers are applied to the full text of news reports vs. only the headlines. These experiments demonstrate a decrease of 5 - 20% in accuracy. A second challenging feature of microtexts is their accumulation in real time, which can be massive for sources such as Twitter. A dynamic hierarchical clustering algorithm that clusters messages as they accumulate is described, and a preliminary experiment in clustering tweets is demonstrated.

dynamic microcluster chain, microtext

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Unsupervised Modeling of Patient-Level Disease Dynamics

Tamang, Suzanne (City University of New York, The Graduate Center) | Parsons, Simon (City University of New York, Brooklyn College and The Graduate Center )

AAAI ConferencesMar-21-2013

To provide insight into patient-level disease dynamics from data collected at irregular time intervals, this work extends applications of semi-parametric clustering for temporal mining. In the semi-parametric clustering framework, Markovian models provide useful parametric assumptions for modeling temporal dynamics, and a non-parametric method isused to cluster the temporal abstractions instead operating on the original data. Our contribution extends abstraction to continuous-time Markov models and the clustering componentto the non-parametric Bayesian setting, which does not require the number of clusters to be indicated a priori.

artificial intelligence, machine learning, patient-level disease dynamic, (1 more...)

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Warped Mixtures for Nonparametric Cluster Shapes

Iwata, Tomoharu, Duvenaud, David, Ghahramani, Zoubin

arXiv.org Machine LearningMar-21-2013

A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.

artificial intelligence, latent coordinate, machine learning, (16 more...)

arXiv.org Machine Learning

1206.1846

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

A Method for Comparing Hedge Funds

Kartoun, Uri

arXiv.org Machine LearningMar-19-2013

The paper presents new machine learning methods: signal composition, which classifies time-series regardless of length, type, and quantity; and self-labeling, a supervised-learning enhancement. The paper describes further the implementation of the methods on a financial search engine system to identify behavioral similarities among time-series representing monthly returns of 11,312 hedge funds operated during approximately one decade (2000 - 2010). The presented approach of cross-category and cross-location classification assists the investor to identify alternative investments.

artificial intelligence, hedge fund, machine learning, (17 more...)

arXiv.org Machine Learning

1303.0073

Country: North America > United States (0.94)

Genre: Research Report (0.50)

Industry:

Banking & Finance > Trading (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

On the convergence of maximum variance unfolding

Arias-Castro, Ery, Pelletier, Bruno

arXiv.org Machine LearningMar-19-2013

Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

1209.0016

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Better subset regression

Xiong, Shifeng

arXiv.org Machine LearningMar-18-2013

To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a "better fitting, better screening" rule, i.e., pick a "better" subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their monotonicity property makes the subset have better model fitting than initial subsets generated by popular screening methods, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive in high dimensional variable screening even for finite sample sizes.

artificial intelligence, machine learning, subset, (18 more...)

arXiv.org Machine Learning

1212.0634

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Topic Discovery through Data Dependent and Random Projections

Ding, Weicong, Rohban, Mohammad H., Ishwar, Prakash, Saligrama, Venkatesh

arXiv.org Machine LearningMar-18-2013

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.

machine learning, natural language, novel word, (16 more...)

arXiv.org Machine Learning

1303.3664

Country:

Asia > Middle East (0.67)
Europe (0.67)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

Generalization Bounds for Metric and Similarity Learning

Cao, Qiong, Guo, Zheng-Chu, Ying, Yiming

arXiv.org Machine LearningMar-17-2013

Recently, metric learning and similarity learning have attracted a large amount of interest. Many models and optimisation algorithms have been proposed. However, there is relatively little work on the generalization analysis of such methods. In this paper, we derive novel generalization bounds of metric and similarity learning. In particular, we first show that the generalization analysis reduces to the estimation of the Rademacher average over "sums-of-i.i.d." sample-blocks related to the specific matrix norm. Then, we derive generalization bounds for metric/similarity learning with different matrix-norm regularisers by estimating their specific Rademacher complexities. Our analysis indicates that sparse metric/similarity learning with $L^1$-norm regularisation could lead to significantly better bounds than those with Frobenius-norm regularisation. Our novel generalization analysis develops and refines the techniques of U-statistics and Rademacher complexity analysis.

artificial intelligence, machine learning, similarity, (17 more...)

arXiv.org Machine Learning

1207.5437

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback