Not enough data to create a plot.
Try a different view from the menu above.
Dasgupta, Sanjoy
Agglomerative Bregman Clustering
Telgarsky, Matus, Dasgupta, Sanjoy
This manuscript develops the theory of agglomerative clustering with Bregman divergences. Geometric smoothing techniques are developed to deal with degenerate clusters. To allow for cluster models based on exponential families with overcomplete representations, Bregman divergences are developed for nondifferentiable convex functions.
Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
Verma, Nakul, Kpotufe, Samory, Dasgupta, Sanjoy
Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search.
Rates of convergence for the cluster tree
Chaudhuri, Kamalika, Dasgupta, Sanjoy
For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f. We present a procedure for estimating the cluster tree given samples from f. We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem.
A general agnostic active learning algorithm
Dasgupta, Sanjoy, Hsu, Daniel J., Monteleoni, Claire
We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous work on active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using reductions to supervised learning that harness generalization bounds in a simple but subtle manner. We provide a fallback guarantee that bounds the algorithm's label complexity by the agnostic PAC sample complexity.
A learning framework for nearest neighbor search
Cayton, Lawrence, Dasgupta, Sanjoy
Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.
A general agnostic active learning algorithm
Dasgupta, Sanjoy, Hsu, Daniel J., Monteleoni, Claire
We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous workon active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using reductions tosupervised learning that harness generalization bounds in a simple but subtle manner. We provide a fallback guarantee that bounds the algorithm's label complexity by the agnostic PAC sample complexity.
Random projection trees for vector quantization
Dasgupta, Sanjoy, Freund, Yoav
Coarse sample complexity bounds for active learning
Dasgupta, Sanjoy
Analysis of a greedy active learning strategy
Dasgupta, Sanjoy
We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity.We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels.