AITopics

Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kelinberg, (Kleinberg) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in Kleinberg. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the primitive object to be axiomatized. We show that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how `strong' or `conclusive' the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (`axioms') that express these requirement and extend the translation of Kleinberg's axioms to our framework. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we show that the proposed clustering quality can be computed in polynomial time.

axiom, clustering-quality measure, kleinberg, (14 more...)

Country: North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

Zhou, Shuheng

Given $n$ noisy samples with $p$ dimensions, where $n \ll p$, we show that the multi-stage thresholding procedures can accurately estimate a sparse vector $\beta \in \R^p$ in a linear model, under the restricted eigenvalue conditions (Bickel-Ritov-Tsybakov 09). Thus our conditions for model selection consistency are considerably weaker than what has been achieved in previous works. More importantly, this method allows very significant values of $s$, which is the number of non-zero elements in the true parameter $\beta$. For example, it works for cases where the ordinary Lasso would have failed. Finally, we show that if $X$ obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, the Gauss-Dantzig selector (Cand\{e}s-Tao 07) achieves the $\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle which would supply perfect information about which coordinates are non-zero and which are above the noise level, while selecting a sufficiently sparse model.

dantzig selector, probability, selector, (15 more...)

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Adaptive Regularization for Transductive Support Vector Machine

Xu, Zenglin, Jin, Rong, Zhu, Jianke, King, Irwin, Lyu, Michael, Yang, Zhirong

We discuss the framework of Transductive Support Vector Machine (TSVM) from the perspective of the regularization strength induced by the unlabeled data. In this framework, SVM and TSVM can be regarded as a learning machine without regularization and one with full regularization from the unlabeled data, respectively. Therefore, to supplement this framework of the regularization strength, it is necessary to introduce data-dependant partial regularization. To this end, we reformulate TSVM into a form with controllable regularization strength, which includes SVM and TSVM as special cases. Furthermore, we introduce a method of adaptive regularization that is data dependant and is based on the smoothness assumption. Experiments on a set of benchmark data sets indicate the promising results of the proposed work compared with state-of-the-art TSVM algorithms.

assumption, regularization, unlabeled data, (13 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

Xiao, Lin

We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as L1-norm for sparsity. We develop a new online algorithm, the regularized dual averaging method, that can explicitly exploit the regularization structure in an online setting. In particular, at each iteration, the learning variables are adjusted by solving a simple optimization problem that involves the running average of all past subgradients of the loss functions and the whole regularization term, not just its subgradient. This method achieves the optimal convergence rate and often enjoys a low complexity per iteration similar as the standard stochastic gradient method. Computational experiments are presented for the special case of sparse online learning using L1-regularization.

online algorithm, optimization, rda method, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization

Wright, John, Ganesh, Arvind, Rao, Shankar, Peng, Yigang, Ma, Yi

Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. Thispaper considers the idealized "robust principal component analysis" problem of recovering a low rank matrix A from corrupted observations D A E. Here, the corrupted entries E are unknown and the errors can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns bysolving a simple convex program, for which we give a fast and provably convergent algorithm. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation spaceand the number of errors E grows in proportion to the total number of entries in the matrix. A byproduct of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction ofits entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision.

algorithm, low-rank matrix, matrix, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Illinois (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.81)

Subramanya, Amarnag, Bilmes, Jeff A.

Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that is cache cognizant and leads to a linear speedup in parallel computations. This ensures that the algorithm scales to large data sets. By making use of empirical evaluation on the TIMIT and Switchboard I corpora, we show this approach is able to out-perform other state-of-the-art SSL approaches. In one instance, we solve a problem on a 120 million node graph.

algorithm, neighbor, ssl algorithm, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Shervashidze, Nino, Borgwardt, Karsten M.

Fast subtree kernels on graphs

In this article, we propose fast subtree kernels on graphs.

graph, kernel, subtree kernel, (15 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > China > Hong Kong (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Learning in Markov Random Fields using Tempered Transitions

Salakhutdinov, Ruslan R.

Markov random fields (MRFs), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRFs is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of Robbins-Monro type that uses Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large densely-connected MRFs. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data and perform well on digit and object recognition tasks.

algorithm, gibbs update, transition, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)

Ram, Parikshit, Lee, Dongryeol, March, William, Gray, Alexander G.

Linear-time Algorithms for Pairwise Statistical Problems

Several key computational bottlenecks in machine learning involve pairwise distance computations, including all-nearest-neighbors (finding the nearest neighbor(s) for each point, e.g. in manifold learning) and kernel summations (e.g. in kernel density estimation or kernel machines). We consider the general, bichromatic case for these problems, in addition to the scientific problem of N-body potential calculation. In this paper we show for the first time O(N) worst case runtimes for practical algorithms for these problems based on the cover tree data structure (Beygelzimer, Kakade, Langford, 2006).

algorithm, algorithm 1, subroutine, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Multi-Label Prediction via Sparse Infinite CCA

Rai, Piyush, Daume, Hal

Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction.

cca, dimensionality reduction, matrix, (13 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Utah (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)