Goto

Collaborating Authors

 Bhaskara, Aditya


Linear Relaxations for Finding Diverse Elements in Metric Spaces

Neural Information Processing Systems

Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint. The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the so-called {\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms.


Distributed Balanced Clustering via Mapping Coresets

Neural Information Processing Systems

Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the ``balanced clustering'' problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound distributed algorithm remains an open problem. In the present paper we develop a general framework based on ``mapping coresets'' to tackle this issue. For a wide range of clustering objective functions such as k-center, k-median, and k-means, our techniques give distributed algorithms for balanced clustering that match the best known single machine approximation ratios.


Provable Bounds for Learning Some Deep Representations

arXiv.org Artificial Intelligence

We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an $n$ node multilayer neural net that has degree at most $n^{\gamma}$ for some $\gamma <1$ and each edge has a random edge weight in $[-1,1]$. Our algorithm learns {\em almost all} networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.