AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Estimation of scale functions to model heteroscedasticity by support vector machines

arXiv.org Machine LearningNov-8-2011

A main goal of regression is to derive statistical conclusions on the conditional distribution of the output variable Y given the input values x. Two of the most important characteristics of a single distribution are location and scale. Support vector machines (SVMs) are well established to estimate location functions like the conditional median or the conditional mean. We investigate the estimation of scale functions by SVMs when the conditional median is unknown, too. Estimation of scale functions is important e.g. to estimate the volatility in finance. We consider the median absolute deviation (MAD) and the interquantile range (IQR) as measures of scale. Our main result shows the consistency of MAD-type SVMs.

artificial intelligence, machine learning, svm, (16 more...)

arXiv.org Machine Learning

1111.183

Country:

Europe (0.15)
North America > United States (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Spectral Methods for Learning Multivariate Latent Tree Structure

Anandkumar, Animashree, Chaudhuri, Kamalika, Hsu, Daniel, Kakade, Sham M., Song, Le, Zhang, Tong

arXiv.org Machine LearningNov-8-2011

This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics.

artificial intelligence, leaf component, machine learning, (14 more...)

arXiv.org Machine Learning

1107.1283

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization

Mateos, Gonzalo, Giannakis, Georgios B.

arXiv.org Machine LearningNov-7-2011

Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an $\ell_0$-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2012.2204986

1111.1788

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Qualitative Robustness of Support Vector Machines

Hable, Robert, Christmann, Andreas

arXiv.org Machine LearningNov-3-2011

Support vector machines have attracted much attention in theoretical and in applied statistics. Main topics of recent interest are consistency, learning rates and robustness. In this article, it is shown that support vector machines are qualitatively robust. Since support vector machines can be represented by a functional on the set of all probability measures, qualitative robustness is proven by showing that this functional is continuous with respect to the topology generated by weak convergence of probability measures. Combined with the existence and uniqueness of support vector machines, our results show that support vector machines are the solutions of a well-posed mathematical problem in Hadamard's sense.

artificial intelligence, machine learning, support vector machine, (16 more...)

arXiv.org Machine Learning

0912.0874

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Model Selection in Undirected Graphical Models with the Elastic Net

Cucuringu, Mihai, Puente, Jesus, Shue, David

arXiv.org Machine LearningNov-2-2011

Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Machine Learning

1111.0559

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Classifying Scientific Publications Using Abstract Features

Caragea, Cornelia (Pennsylvania State University) | Silvescu, Adrian (Naviance Inc.) | Kataria, Saurabh (Pennsylvania State University) | Caragea, Doina (Kansas State University) | Mitra, Prasenjit (Pennsylvania State University)

AAAI ConferencesNov-1-2011

With the exponential increase in the number of documents available online, e.g., news articles, weblogs, scientific documents, effective and efficient classification methods are required in order to deliver the appropriate information to specific users or groups. The performance of document classifiers critically depends, among other things, on the choice of the feature representation. The commonly used "bag of words" representation can result in a large number of features. Feature abstraction helps reduce a classifier input size by learning an abstraction hierarchy over the set of words. A cut through the hierarchy specifies a compressed model, where the nodes on the cut represent abstract features. In this paper, we compare feature abstraction with two other methods for dimensionality reduction, i.e., feature selection and Latent Dirichlet Allocation (LDA). Experimental results on two data sets of scientific publications show that classifiers trained using abstract features significantly outperform those trained using features that have the highest average mutual information with the class, and those trained using the topic distribution and topic words output by LDA. Furthermore, we propose an approach to automatic identification of a cut in order to trade off the complexity of classifiers against their performance. Our results demonstrate the feasibility of the proposed approach.

abstraction, accuracy, classifier, (15 more...)

AAAI Conferences

Ninth Symposium of Abstraction, Reformulation, and Approximation

Country:

Asia > Middle East > Jordan (0.06)
North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

Evaluating Questions in Context

Becker, Lee (University of Colorado Boulder) | Palmer, Martha S. (University of Colorado Boulder) | Vuuren, Sarel van (University of Colorado Boulder) | Ward, Wayne H. (Boulder Language Technologies )

AAAI ConferencesNov-1-2011

We present an evaluation methodology and a system for ranking questions within the context of a multimodal tutorial dialogue. Such a framework has applications for automatic question selection and generation in intelligent tutoring systems. To create this ranking system we manually author candidate questions for specific points in a dialogue and have raters assign scores to these questions. To explore the role of question type in scoring, we annotate dialogue turns with labels from the DISCUSS dialogue move taxonomy. Questions are ranked using a SVM-regression model trained with features extracted from the dialogue context, the candidate question, and the human ratings. Evaluation shows that our system’s rankings correlate with human judgments in question ranking.

candidate question, machine learning, natural language, (19 more...)

AAAI Conferences

2011 AAAI Fall Symposium Series

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Europe > Finland > Paijanne Tavastia > Lahti (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Understanding (0.34)

Add feedback

Convergence Rates for Mixture-of-Experts

Mendes, Eduardo F., Jiang, Wenxin

arXiv.org Machine LearningNov-1-2011

In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where $m$ experts are mixed, with each expert being related to a polynomial regression model of order $k$. We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size $n$ increases. The convergence rate is found to be dependent on both $m$ and $k$, and certain choices of $m$ and $k$ are found to produce optimal convergence rates. Therefore, these results shed light on the two aforementioned important problems: on how to choose $m$, and on how $m$ and $k$ should be compromised, for achieving good convergence rates.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1110.2058

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Bayesian Optimization for Adaptive MCMC

Mahendran, Nimalan, Wang, Ziyu, Hamze, Firas, de Freitas, Nando

arXiv.org Machine LearningOct-29-2011

This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.

bayesian optimization, optimization problem, upstream oil & gas, (17 more...)

arXiv.org Machine Learning

1110.6497

Country:

North America > Canada > Alberta (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Add feedback

Ordinal Risk-Group Classification

Toren, Yizhar

arXiv.org Machine LearningOct-27-2011

Most classification methods provide either a prediction of class membership or an assessment of class membership probability. In the case of two-group classification the predicted probability can be described as "risk" of belonging to a "special" class . When the required output is a set of ordinal-risk groups, a discretization of the continuous risk prediction is achieved by two common methods: by constructing a set of models that describe the conditional risk function at specific points (quantile regression) or by dividing the output of an "optimal" classification model into adjacent intervals that correspond to the desired risk groups. By defining a new error measure for the distribution of risk onto intervals we are able to identify lower bounds on the accuracy of these methods, showing sub-optimality both in their distribution of risk and in the efficiency of their resulting partition into intervals. By adding a new form of constraint to the existing maximum likelihood optimization framework and by introducing a penalty function to avoid degenerate solutions, we show how existing methods can be augmented to solve the ordinal risk-group classification problem. We implement our method for logistic regression (LR) and show a numeric example.

artificial intelligence, machine learning, risk group, (17 more...)

arXiv.org Machine Learning

1012.5487

Country: Europe > Austria (0.28)

Genre:

Research Report > Experimental Study (0.51)
Research Report > New Finding (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback