David Woodruff
Robust Subspace Approximation in a Stream
Roie Levin, Anish Prasad Sevekari, David Woodruff
Towards a Zero-One Law for Column Subset Selection
Zhao Song, David Woodruff, Peilin Zhong
Sublinear Time Low-Rank Approximation of Distance Matrices
Ainesh Bakshi, David Woodruff
Such distance matrices are commonly computed in software packages and have applications to learning image manifolds, handwriting recognition, and multi-dimensional unfolding, among other things. In an attempt to reduce their description size, we study low rank approximation of such matrices. Our main result is to show that for any underlying distance metric d, it is possible to achieve an additive error low rank approximation in sublinear time. We note that it is provably impossible to achieve such a guarantee in sublinear time for arbitrary matrices A, and our proof exploits special properties of distance matrices. We develop a recursive algorithm based on additive projection-cost preserving sampling. We then show that in general, relative error approximation in sublinear time is impossible for distance matrices, even if one allows for bicriteria solutions. Additionally, we show that if P = Q and d is the squared Euclidean distance, which is not a metric but rather the square of a metric, then a relative error bicriteria solution can be found in sublinear time. Finally, we empirically compare our algorithm with the singular value decomposition (SVD) and input sparsity time algorithms.
Efficient and Thrifty Voting by Any Means Necessary
Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff
We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions. Intuitively, there is a tradeoff between the communication requirements of the elicitation rule (i.e., the number of bits of information that voters need to provide about their preferences) and the efficiency of the outcome of the aggregation rule, which we measure through distortion (i.e., how well the utilitarian social welfare of the outcome approximates the maximum social welfare in the worst case). Our results chart the Pareto frontier of the communication-distortion tradeoff.
Regularized Weighted Low Rank Approximation
Frank Ban, David Woodruff, Richard Zhang
On Coresets for Logistic Regression
Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David Woodruff
Coresets are one of the central methods to facilitate the analysis of large data. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show the negative result that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure ยต(X), which quantifies the hardness of compressing a data set for logistic regression.
Total Least Squares Regression in Input Sparsity Time
Huaian Diao, Zhao Song, David Woodruff, Xin Yang
Efficient and Thrifty Voting by Any Means Necessary
Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff
We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions. Intuitively, there is a tradeoff between the communication requirements of the elicitation rule (i.e., the number of bits of information that voters need to provide about their preferences) and the efficiency of the outcome of the aggregation rule, which we measure through distortion (i.e., how well the utilitarian social welfare of the outcome approximates the maximum social welfare in the worst case). Our results chart the Pareto frontier of the communication-distortion tradeoff.