AITopics | arXiv.org Machine Learning

Plotting

arXiv.org Machine Learning

An Augmented Lagrangian Approach for Sparse Principal Component Analysis

arXiv.org Machine LearningJul-12-2009

Principal component analysis (PCA) is a widely used technique for data analysis and dimension reduction with numerous applications in science and engineering. However, the standard PCA suffers from the fact that the principal components (PCs) are usually linear combinations of all the original variables, and it is thus often difficult to interpret the PCs. To alleviate this drawback, various sparse PCA approaches were proposed in literature [15, 6, 17, 28, 8, 25, 18, 7, 16]. Despite success in achieving sparsity, some important properties enjoyed by the standard PCA are lost in these methods such as uncorrelation of PCs and orthogonality of loading vectors. Also, the total explained variance that they attempt to maximize can be too optimistic. In this paper we propose a new formulation for sparse PCA, aiming at finding sparse and nearly uncorrelated PCs with orthogonal loading vectors while explaining as much of the total variance as possible. We also develop a novel augmented Lagrangian method for solving a class of nonsmooth constrained optimization problems, which is well suited for our formulation of sparse PCA. We show that it converges to a feasible point, and moreover under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the augmented Lagrangian subproblems, and establish their global and local convergence. Finally, we compare our sparse PCA approach with several existing methods on synthetic, random, and real data, respectively. The computational results demonstrate that the sparse PCs produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors.

artificial intelligence, augmented lagrangian method, optimization problem, (17 more...)

arXiv.org Machine Learning

0907.2079

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Add feedback

A new protein binding pocket similarity measure based on comparison of 3D atom clouds: application to ligand prediction

Hoffmann, Brice, Zaslavskiy, Mikhail, Vert, Jean-Philippe, Stoven, Véronique

arXiv.org Machine LearningJul-9-2009

Motivation: Prediction of ligands for proteins of known 3D structure is important to understand structure-function relationship, predict molecular function, or design new drugs. Results: We explore a new approach for ligand prediction in which binding pockets are represented by atom clouds. Each target pocket is compared to an ensemble of pockets of known ligands. Pockets are aligned in 3D space with further use of convolution kernels between clouds of points. Performance of the new method for ligand prediction is compared to those of other available measures and to docking programs. We discuss two criteria to compare the quality of similarity measures: area under ROC curve (AUC) and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction. Our results on existing and new benchmarks indicate that the new method outperforms other approaches, including docking. Availability: The new method is available at http://cbio.ensmp.fr/paris/ Contact: mikhail.zaslavskiy@mines-paristech.fr

artificial intelligence, health & medicine, ligand, (18 more...)

arXiv.org Machine Learning

0907.1531

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Add feedback

Visualizing Topics with Multi-Word Expressions

Blei, David M., Lafferty, John D.

arXiv.org Machine LearningJul-6-2009

We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant $n$-grams related to a topic, which are then used to help understand and interpret the underlying distribution. Compared with the usual visualization, which simply lists the most probable topical terms, the multi-word expressions provide a better intuitive impression for what a topic is "about." Our approach is based on a language model of arbitrary length expressions, for which we develop a new methodology based on nested permutation tests to find significant phrases. We show that this method outperforms the more standard use of $\chi^2$ and likelihood ratio tests. We illustrate the topic presentations on corpora of scientific abstracts and news articles.

bayesian inference, probability, us government, (20 more...)

arXiv.org Machine Learning

0907.1013

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New Jersey (0.14)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Bayesian Agglomerative Clustering with Coalescents

Teh, Yee Whye, Daumé, Hal III, Roy, Daniel

arXiv.org Machine LearningJul-4-2009

We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

0907.0781

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback

Multiple Hypothesis Testing in Pattern Discovery

Hanhijärvi, Sami, Puolamäki, Kai, Garriga, Gemma C.

arXiv.org Machine LearningJun-29-2009

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.

artificial intelligence, hypothesis, scientific discovery, (19 more...)

arXiv.org Machine Learning

0906.5263

Country:

North America > United States (0.14)
Europe > Finland (0.14)

Genre: Research Report > Experimental Study (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

Add feedback

High Dimensional Nonlinear Learning using Local Coordinate Coding

Yu, Kai, Zhang, Tong

arXiv.org Machine LearningJun-28-2009

This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point $x$ on the manifold can be locally approximated by a linear combination of its nearby anchor points, with the linear weights offering a local-coordinate coding of $x$. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. The work also gives a theoretical justification to the empirical success of some biologically-inspired models using sparse coding of sensory data, since a local coding scheme must be sufficiently sparse. However, sparsity does not always satisfy locality conditions, and can thus possibly lead to suboptimal results. The properties and performances of the method are empirically verified on synthetic data, handwritten digit classification, and object recognition tasks.

artificial intelligence, local coordinate, machine learning, (15 more...)

arXiv.org Machine Learning

0906.5190

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education > Focused Education > Special Education (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On landmark selection and sampling in high-dimensional data analysis

Belabbas, Mohamed-Ali, Wolfe, Patrick J.

arXiv.org Machine LearningJun-24-2009

In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nystrom extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world examples drawn from the field of computer vision, whereby low-dimensional manifold structure is shown to emerge from high-dimensional video data streams.

approximation, survey article, us government, (18 more...)

arXiv.org Machine Learning

doi: 10.1098/rsta.2009.0161

0906.4582

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

The Feature Importance Ranking Measure

Zien, Alexander, Kraemer, Nicole, Sonnenburg, Soeren, Raetsch, Gunnar

arXiv.org Machine LearningJun-23-2009

Most accurate predictions are typically obtained by learning machines with complex feature spaces (as e.g. induced by kernels). Unfortunately, such decision rules are hardly accessible to humans and cannot easily be used to gain insights about the application domain. Therefore, one often resorts to linear models in combination with variable selection, thereby sacrificing some predictive power for presumptive interpretability. Here, we introduce the Feature Importance Ranking Measure (FIRM), which by retrospective analysis of arbitrary learning machines allows to achieve both excellent predictive performance and superior interpretation. In contrast to standard raw feature weighting, FIRM takes the underlying correlation structure of the features into account. Thereby, it is able to discover the most relevant features, even if their appearance in the training data is entirely prevented by noise. The desirable properties of FIRM are investigated analytically and illustrated in simulations.

artificial intelligence, feature importance, health & medicine, (15 more...)

arXiv.org Machine Learning

0906.4258

Genre: Research Report (0.90)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.93)

Add feedback

KNIFE: Kernel Iterative Feature Extraction

Allen, Genevera I.

arXiv.org Machine LearningJun-23-2009

Selecting important features in non-linear or kernel spaces is a difficult challenge in both classification and regression problems. When many of the features are irrelevant, kernel methods such as the support vector machine and kernel ridge regression can sometimes perform poorly. We propose weighting the features within a kernel with a sparse set of weights that are estimated in conjunction with the original classification or regression problem. The iterative algorithm, KNIFE, alternates between finding the coefficients of the original problem and finding the feature weights through kernel linearization. In addition, a slight modification of KNIFE yields an efficient algorithm for finding feature regularization paths, or the paths of each feature's weight. Simulation results demonstrate the utility of KNIFE for both kernel regression and support vector machines with a variety of kernels. Feature path realizations also reveal important non-linear correlations among features that prove useful in determining a subset of significant variables. Results on vowel recognition data, Parkinson's disease data, and microarray data are also given.

artificial intelligence, health & medicine, kernel, (18 more...)

arXiv.org Machine Learning

0906.4391

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.87)
Health & Medicine > Therapeutic Area (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)

Add feedback

Forest Garrote

Meinshausen, Nicolai

arXiv.org Machine LearningJun-19-2009

Variable selection for high-dimensional linear models has received a lot of attention lately, mostly in the context of l1-regularization. Part of the attraction is the variable selection effect: parsimonious models are obtained, which are very suitable for interpretation. In terms of predictive power, however, these regularized linear models are often slightly inferior to machine learning procedures like tree ensembles. Tree ensembles, on the other hand, lack usually a formal way of variable selection and are difficult to visualize. A Garrote-style convex penalty for trees ensembles, in particular Random Forests, is proposed. The penalty selects functional groups of nodes in the trees. These could be as simple as monotone functions of individual predictor variables. This yields a parsimonious function fit, which lends itself easily to visualization and interpretation. The predictive power is maintained at least at the same level as the original tree ensemble. A key feature of the method is that, once a tree ensemble is fitted, no further tuning parameter needs to be selected. The empirical performance is demonstrated on a wide array of datasets.

decision tree learning, forest garrote, health & medicine, (19 more...)

arXiv.org Machine Learning

0906.3590

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback