Alexander Munteanu
On Coresets for Logistic Regression
Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David Woodruff
On Coresets for Logistic Regression
Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David Woodruff
Coresets are one of the central methods to facilitate the analysis of large data. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show the negative result that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure µ(X), which quantifies the hardness of compressing a data set for logistic regression.
Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves
Stefan Meintrup, Alexander Munteanu, Dennis Rohde
We study the k-median clustering problem for high-dimensional polygonal curves with finite but unbounded number of vertices. We tackle the computational issue that arises from the high number of dimensions by defining a Johnson-Lindenstrauss projection for polygonal curves. We analyze the resulting error in terms of the Fréchet distance, which is a tractable and natural dissimilarity measure for curves. Our clustering algorithms achieve sublinear dependency on the number of input curves via subsampling. Also, we show that the Fréchet distance can not be approximated within any factor of less than 2 by probabilistically reducing the dependency on the number of vertices of the curves. As a consequence we provide a fast, CUDA-parallelized version of the Alt and Godau algorithm for computing the Fréchet distance and use it to evaluate our results empirically.