Goto

Collaborating Authors

 sublinear algorithm


Sublinear Algorithms for Hierarchical Clustering

Neural Information Processing Systems

Hierarchical clustering over graphs is a fundamental task in data mining and machine learning with applications in many domains including phylogenetics, social network analysis, and information retrieval. Specifically, we consider the recently popularized objective function for hierarchical clustering due to Dasgupta~\cite{Dasgupta16}, namely, minimum cost hierarchical partitioning. Previous algorithms for (approximately) minimizing this objective function require linear time/space complexity. In many applications the underlying graph can be massive in size making it computationally challenging to process the graph even using a linear time/space algorithm. As a result, there is a strong interest in designing algorithms that can perform global computation using only sublinear resources (space, time, and communication). The focus of this work is to study hierarchical clustering for massive graphs under three well-studied models of sublinear computation which focus on space, time, and communication, respectively, as the primary resources to optimize: (1) (dynamic) streaming model where edges are presented as a stream, (2) query model where the graph is queried using neighbor and degree queries, (3) massively parallel computation (MPC) model where the edges of the graph are partitioned over several machines connected via a communication channel.We design sublinear algorithms for hierarchical clustering in all three models above. At the heart of our algorithmic results is a view of the objective in terms of cuts in the graph, which allows us to use a relaxed notion of cut sparsifiers to do hierarchical clustering while introducing only a small distortion in the objective function. Our main algorithmic contributions are then to show how cut sparsifiers of the desired form can be efficiently constructed in the query model and the MPC model. We complement our algorithmic results by establishing nearly matching lower bounds that rule out the possibility of designing algorithms with better performance guarantees in each of these models.


Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces Vincent Cohen-Addad Google Research, Zurich. David Saulpic Sorbonne Universit e, Paris Chris Schwiegelshohn

Neural Information Processing Systems

Special cases of problem include the well-known Fermat-Weber problem - or geometric median problem - where z = 1, the mean or centroid where z = 2, and the Minimum Enclosing Ball problem, where z = . We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate m. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the set A and the goal is to minimize the number of queries.




Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces Vincent Cohen-Addad Google Research, Zurich. David Saulpic Sorbonne Universit e, Paris Chris Schwiegelshohn

Neural Information Processing Systems

Special cases of problem include the well-known Fermat-Weber problem - or geometric median problem - where z = 1, the mean or centroid where z = 2, and the Minimum Enclosing Ball problem, where z = . We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate m. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the set A and the goal is to minimize the number of queries.


A Sublinear Algorithm for Path Feasibility Among Rectangular Obstacles

Fan, Alex, Li, Alicia, Kolla, Arul, Gonzalez, Jason

arXiv.org Artificial Intelligence

The problem of finding a path between two points while avoiding obstacles is critical in robotic path planning. We focus on the feasibility problem: determining whether such a path exists. We model the robot as a query-specific rectangular object capable of moving parallel to its sides. The obstacles are axis-aligned, rectangular, and may overlap. Most previous works only consider nondisjoint rectangular objects and point-sized or statically sized robots. Our approach introduces a novel technique leveraging generalized Gabriel graphs and constructs a data structure to facilitate online queries regarding path feasibility with varying robot sizes in sublinear time. To efficiently handle feasibility queries, we propose an online algorithm utilizing sweep line to construct a generalized Gabriel graph under the $L_\infty$ norm, capturing key gap constraints between obstacles. We utilize a persistent disjoint-set union data structure to efficiently determine feasibility queries in $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ total space.


Sublinear Algorithms for Hierarchical Clustering

Neural Information Processing Systems

Hierarchical clustering over graphs is a fundamental task in data mining and machine learning with applications in many domains including phylogenetics, social network analysis, and information retrieval. Specifically, we consider the recently popularized objective function for hierarchical clustering due to Dasgupta \cite{Dasgupta16}, namely, minimum cost hierarchical partitioning. Previous algorithms for (approximately) minimizing this objective function require linear time/space complexity. In many applications the underlying graph can be massive in size making it computationally challenging to process the graph even using a linear time/space algorithm. As a result, there is a strong interest in designing algorithms that can perform global computation using only sublinear resources (space, time, and communication).


Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

Babichev, Dmitry, Ostrovskii, Dmitrii, Bach, Francis

arXiv.org Machine Learning

We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$. Our focus is on a special class of losses that includes, in particular, the multiclass hinge and logistic losses. Our approach combines several ideas: (i) passing to the equivalent saddle-point problem with a quasi-bilinear objective; (ii) applying stochastic mirror descent with a proper choice of geometry which guarantees a favorable accuracy bound; (iii) devising non-uniform sampling schemes to approximate the matrix products. In particular, for the multiclass hinge loss we propose a \textit{sublinear} algorithm with iterations performed in $O(d+n+k)$ arithmetic operations.