AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Transforming Graph Data for Statistical Relational Learning

Rossi, R. A., McDowell, L. K., Aha, D. W., Neville, J.

Journal of Artificial Intelligence ResearchOct-30-2012

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.

knowledge discovery, representation, transforming graph data, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3659

AI Access Foundation

10786

Journal of Artificial Intelligence Research

Country:

Asia > Middle East > Jordan (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.92)
Health & Medicine (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Empirical Normalization for Quadratic Discriminant Analysis and Classifying Cancer Subtypes

Kon, Mark A., Nikolaev, Nikolay

arXiv.org Machine LearningOct-29-2012

We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the training and test data into new data with components having Gaussian empirical distributions. This map is an empirical version of the Gaussian copula used in probability and mathematical finance. The purpose is to form a feature mapped dataset as close as possible to Gaussian, after which standard quadratic discriminants can be used for classification. We discuss this method in general, and apply it to some datasets in computational biology.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1109/ICMLA.2011.160

1203.6345

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Recognizing Static Signs from the Brazilian Sign Language: Comparing Large-Margin Decision Directed Acyclic Graphs, Voting Support Vector Machines and Artificial Neural Networks

de Souza, César Roberto, Pizzolato, Ednaldo Brigante, Anjo, Mauro dos Santos

arXiv.org Machine LearningOct-28-2012

In this paper, we explore and detail our experiments in a high-dimensionality, multi-class image classification problem often found in the automatic recognition of Sign Languages. Here, our efforts are directed towards comparing the characteristics, advantages and drawbacks of creating and training Support Vector Machines disposed in a Directed Acyclic Graph and Artificial Neural Networks to classify signs from the Brazilian Sign Language (LIBRAS). We explore how the different heuristics, hyperparameters and multi-class decision schemes affect the performance, efficiency and ease of use for each classifier. We provide hyperparameter surface maps capturing accuracy and efficiency, comparisons between DDAGs and 1-vs-1 SVMs, and effects of heuristics when training ANNs with Resilient Backpropagation. We report statistically significant results using Cohen's Kappa statistic for contingency tables.

artificial intelligence, classifier, machine learning, (9 more...)

arXiv.org Machine Learning

1210.7461

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning mixtures of spherical Gaussians: moment methods and spectral decompositions

Hsu, Daniel, Kakade, Sham M.

arXiv.org Machine LearningOct-28-2012

This work provides a computationally efficient and statistically consistent moment-based estimator for mixtures of spherical Gaussians. Under the condition that component means are in general position, a simple spectral decomposition technique yields consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationally efficient estimation procedures. Thus computational and information-theoretic barriers to efficient estimation in mixture models are precluded when the mixture components have means in general position and spherical covariances. Some connections are made to estimation problems related to independent component analysis.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1206.5766

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Add feedback

The Bayesian Bridge

Polson, Nicholas G., Scott, James G., Windle, Jesse

arXiv.org Machine LearningOct-27-2012

We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: (1) a scale mixture of normals with respect to an alpha-stable random variable; and (2) a mixture of Bartlett--Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to MCMC methods for posterior simulation, and these methods turn out to have complementary domains of maximum efficiency. The first representation is a well known result due to West (1987), and is the better choice for collinear design matrices. The second representation is new, and is more efficient for orthogonal problems, largely because it avoids the need to deal with exponentially tilted stable random variables. It also provides insight into the multimodality of the joint posterior distribution, a feature of the bridge model that is notably absent under ridge or lasso-type priors. We prove a theorem that extends this representation to a wider class of densities representable as scale mixtures of betas, and provide an explicit inversion formula for the mixing distribution. The connections with slice sampling and scale mixtures of normals are explored. On the practical side, we find that the Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real. We also show that the MCMC for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter. This makes for a favorable contrast with analogous MCMC algorithms for other sparse Bayesian models. All methods described in this paper are implemented in the R package BayesBridge. An extensive set of simulation results are provided in two supplemental files.

artificial intelligence, machine learning, posterior, (19 more...)

arXiv.org Machine Learning

1109.2279

Country: North America > United States > Texas (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Large-Scale Sparse Principal Component Analysis with Application to Text Data

Zhang, Youwei, Ghaoui, Laurent El

arXiv.org Machine LearningOct-26-2012

Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models.

algorithm, artificial intelligence, machine learning, (11 more...)

arXiv.org Machine Learning

1210.7054

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.42)

Add feedback

An Exponential Lower Bound on the Complexity of Regularization Paths

Gärtner, Bernd, Jaggi, Martin, Maria, Clément

arXiv.org Machine LearningOct-25-2012

For a variety of regularized optimization problems in machine learning, algorithms computing the entire solution path have been developed recently. Most of these methods are quadratic programs that are parameterized by a single parameter, as for example the Support Vector Machine (SVM). Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal parameter much easier. It has been assumed that these piecewise linear solution paths have only linear complexity, i.e. linearly many bends. We prove that for the support vector machine this complexity can be exponential in the number of training points in the worst case. More strongly, we construct a single instance of n input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) = \Theta(2^d) many distinct subsets of support vectors occur as the regularization parameter changes.

artificial intelligence, machine learning, solution path, (15 more...)

arXiv.org Machine Learning

0903.4817

Country:

North America (0.46)
Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Ancestor Sampling for Particle Gibbs

Lindsten, Fredrik, Jordan, Michael I., Schön, Thomas B.

arXiv.org Machine LearningOct-25-2012

We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models.

artificial intelligence, machine learning, sampler, (17 more...)

arXiv.org Machine Learning

1210.6911

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Biomimetic Approach Based on Immune Systems for Classification of Unstructured Data

Hamou, Mohamed, Amine, Abdelmalek, Lokbani, Ahmed Chaouki

arXiv.org Artificial IntelligenceOct-25-2012

In this paper we present the results of unstructured data clustering in this case a textual data from Reuters 21578 corpus with a new biomimetic approach using immune system. Before experimenting our immune system, we digitalized textual data by the n-grams approach. The novelty lies on hybridization of n-grams and immune systems for clustering. The experimental results show that the recommended ideas are promising and prove that this method can solve the text clustering problem.

classification, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1210.7002

Country:

Europe (0.46)
Africa > Middle East > Algeria (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Supervised Learning with Similarity Functions

Kar, Purushottam, Jain, Prateek

arXiv.org Machine LearningOct-22-2012

We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.

artificial intelligence, machine learning, similarity function, (19 more...)

arXiv.org Machine Learning

1210.584

Country:

Asia > India (0.28)
North America > United States (0.28)

Genre:

Workflow (0.67)
Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback