AITopics

arXiv.org Machine LearningMay-9-2012

Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?

Verma, Nakul, Kpotufe, Samory, Dasgupta, Sanjoy

Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Machine Learning

1205.2609

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2010

Rates of convergence for the cluster tree

Chaudhuri, Kamalika, Dasgupta, Sanjoy

For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f. We present a procedure for estimating the cluster tree given samples from f. We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem.

artificial intelligence, cluster tree, machine learning, (17 more...)

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

A general agnostic active learning algorithm

Dasgupta, Sanjoy, Hsu, Daniel J., Monteleoni, Claire

We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous work on active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using reductions to supervised learning that harness generalization bounds in a simple but subtle manner. We provide a fallback guarantee that bounds the algorithm's label complexity by the agnostic PAC sample complexity.

active learning, artificial intelligence, machine learning, (20 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

A learning framework for nearest neighbor search

Cayton, Lawrence, Dasgupta, Sanjoy

Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.

artificial intelligence, query, survey article, (19 more...)

Country: North America > United States > California (0.14)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.41)

A general agnostic active learning algorithm

Dasgupta, Sanjoy, Hsu, Daniel J., Monteleoni, Claire

We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous workon active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using reductions tosupervised learning that harness generalization bounds in a simple but subtle manner. We provide a fallback guarantee that bounds the algorithm's label complexity by the agnostic PAC sample complexity.

active learning, artificial intelligence, machine learning, (20 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Learning the structure of manifolds using random projections

Freund, Yoav, Dasgupta, Sanjoy, Kabra, Mayank, Verma, Nakul

We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data.

artificial intelligence, dimension, machine learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Machine LearningMay-9-2008

Random projection trees for vector quantization

Dasgupta, Sanjoy, Freund, Yoav

A simple and computationally efficient scheme for tree-structured vector quantization is presented. Unlike previous methods, its quantization error depends only on the intrinsic dimension of the data distribution, rather than the apparent dimension of the space in which the data happen to lie.

artificial intelligence, machine learning, projection, (16 more...)

arXiv.org Machine Learning

0805.1390

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.68)

Neural Information Processing SystemsDec-31-2006

Coarse sample complexity bounds for active learning

Dasgupta, Sanjoy

We characterize the sample complexity of active learning problems in terms of a parameter which takes into account the distribution over the input space, the specific target hypothesis, and the desired accuracy.

artificial intelligence, hypothesis, machine learning, (20 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2005

Analysis of a greedy active learning strategy

Dasgupta, Sanjoy

We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity.We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels.

artificial intelligence, hypothesis, machine learning, (15 more...)

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)