AITopics | Woodruff, David

Plotting

Woodruff, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximation Algorithms for $\ell_0$-Low Rank Approximation

Bringmann, Karl, Kolev, Pavel, Woodruff, David

Neural Information Processing SystemsDec-31-2017

We study the $\ell_0$-Low Rank Approximation Problem, where the goal is, given an $m \times n$ matrix $A$, to output a rank-$k$ matrix $A'$ for which $\|A'-A\|_0$ is minimized. Here, for a matrix $B$, $\|B\|_0$ denotes the number of its non-zero entries. This NP-hard variant of low rank approximation is natural for problems with no underlying metric, and its goal is to minimize the number of disagreeing data positions. We provide approximation algorithms which significantly improve the running time and approximation factor of previous work. For $k > 1$, we show how to find, in poly$(mn)$ time for every $k$, a rank $O(k \log(n/k))$ matrix $A'$ for which $\|A'-A\|_0 \leq O(k^2 \log(n/k)) \OPT$. To the best of our knowledge, this is the first algorithm with provable guarantees for the $\ell_0$-Low Rank Approximation Problem for $k > 1$, even for bicriteria algorithms. For the well-studied case when $k = 1$, we give a $(2+\epsilon)$-approximation in {\it sublinear time}, which is impossible for other variants of low rank approximation such as for the Frobenius norm. We strengthen this for the well-studied case of binary matrices to obtain a $(1+O(\psi))$-approximation in sublinear time, where $\psi = \OPT/\nnz{A}$. For small $\psi$, our approximation factor is $1+o(1)$.

algorithm, artificial intelligence, data mining, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Mathematics of Computing (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

Add feedback

Sublinear Time Orthogonal Tensor Decomposition

Song, Zhao, Woodruff, David, Zhang, Huan

Neural Information Processing SystemsDec-31-2016

A recent work (Wang et. al., NIPS 2015) gives the fastest known algorithms for orthogonal tensor decomposition with provable guarantees. Their algorithm is based on computing sketches of the input tensor, which requires reading the entire input. We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i.e., even without reading most of the input tensor. Instead of using sketches to estimate inner products in tensor decomposition algorithms, we use importance sampling. To achieve sublinear time, we need to know the norms of tensor slices, and we show how to do this in a number of important cases. For symmetric tensors $ T = \sum_{i=1}^k \lambda_i u_i^{\otimes p}$ with $\lambda_i > 0$ for all i, we estimate such norms in sublinear time whenever p is even. For the important case of p = 3 and small values of k, we can also estimate such norms. For asymmetric tensors sublinear time is not possible in general, but we show if the tensor slice norms are just slightly below $\| T \|_F$ then sublinear time is again possible. One of the main strengths of our work is empirical - in a number of cases our algorithm is orders of magnitude faster than existing methods with the same accuracy.

artificial intelligence, machine learning, tensor, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Communication-Optimal Distributed Clustering

Chen, Jiecao, Sun, He, Woodruff, David, Zhang, Qin

Neural Information Processing SystemsDec-31-2016

Clustering large datasets is a fundamental problem with a number of applications in machine learning. Data is often collected on different sites and clustering needs to be performed in a distributed manner with low communication. We would like the quality of the clustering in the distributed setting to match that in the centralized setting for which all the data resides on a single site. In this work, we study both graph and geometric clustering problems in two distributed models: (1) a point-to-point model, and (2) a model with a broadcast channel. We give protocols in both models which we show are nearly optimal by proving almost matching communication lower bounds. Our work highlights the surprising power of a broadcast channel for clustering problems; roughly speaking, to cluster n points or n vertices in a graph distributed across s servers, for a worst-case partitioning the communication complexity in a point-to-point model is n*s, while in the broadcast model it is n + s. We implement our algorithms and demonstrate this phenomenon on real life datasets, showing that our algorithms are also very efficient in practice.

algorithm, artificial intelligence, us government, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Indiana (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Improved Distributed Principal Component Analysis

Liang, Yingyu, Balcan, Maria-Florina F., Kanchanapally, Vandana, Woodruff, David

Neural Information Processing SystemsDec-31-2014

We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure forapproximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiencyfor a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication andcomputational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest.

artificial intelligence, machine learning, subspace, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

Low Rank Approximation Lower Bounds in Row-Update Streams

Woodruff, David

Neural Information Processing SystemsDec-31-2014

We study low-rank approximation in the streaming model in which the rows of an $n \times d$ matrix $A$ are presented one at a time in an arbitrary order. At the end of the stream, the streaming algorithm should output a $k \times d$ matrix $R$ so that $\|A-AR^{\dagger}R\|_F^2 \leq (1+\eps)\|A-A_k\|_F^2$, where $A_k$ is the best rank-$k$ approximation to $A$. A deterministic streaming algorithm of Liberty (KDD, 2013), with an improved analysis of Ghashami and Phillips (SODA, 2014), provides such a streaming algorithm using $O(dk/\epsilon)$ words of space. A natural question is if smaller space is possible. We give an almost matching lower bound of $\Omega(dk/\epsilon)$ bits of space, even for randomized algorithms which succeed only with constant probability. Our lower bound matches the upper bound of Ghashami and Phillips up to the word size, improving on a simple $\Omega(dk)$ space lower bound.

algorithm, artificial intelligence, data mining, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence (0.68)
Information Technology > Data Science > Data Mining (0.34)

Add feedback

Subspace Embeddings for the Polynomial Kernel

Avron, Haim, Nguyen, Huy, Woodruff, David

Neural Information Processing SystemsDec-31-2014

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the so-called oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while in many settings learning is done in a high-dimensional space implicitly defined by the data matrix via a kernel transformation. We propose the first {\em fast} oblivious subspace embeddings that are able to embed a space induced by a non-linear kernel {\em without} explicitly mapping the data to the high-dimensional space. In particular, we propose an embedding for mappings induced by the polynomial kernel. Using the subspace embeddings, we obtain the fastest known algorithms for computing an implicit low rank approximation of the higher-dimension mapping of the data matrix, and for computing an approximate kernel PCA of the data, as well as doing approximate kernel principal component regression.

artificial intelligence, data mining, matrix, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Sketching Structured Matrices for Faster Nonlinear Regression

Avron, Haim, Sindhwani, Vikas, Woodruff, David

Neural Information Processing SystemsDec-31-2013

Motivated by the desire to extend fast randomized techniques to nonlinear $l_p$ regression, we consider a class of structured regression problems. These problems involve Vandermonde matrices which arise naturally in various statistical modeling settings, including classical polynomial fitting problems and recently developed randomized techniques for scalable kernel methods. We show that this structure can be exploited to further accelerate the solution of the regression problem, achieving running times that are faster than input sparsity''. We present empirical results confirming both the practical value of our modeling framework, as well as speedup benefits of randomized regression."

artificial intelligence, matrix, us government, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback