AITopics | Phillips, Jeff M.

Collaborating Authors

Phillips, Jeff M.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-Optimal Coresets of Kernel Density Estimates

Phillips, Jeff M., Tai, Wai Ming

arXiv.org Machine LearningMar-14-2018

We construct near-optimal coresets for kernel density estimate for points in $\mathbb{R^d}$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(\sqrt{d\log (1/\epsilon)}/\epsilon)$, and we show a near-matching lower bound of size $\Omega(\sqrt{d}/\epsilon)$. The upper bound is a polynomial in $1/\epsilon$ improvement when $d \in [3,1/\epsilon^2)$ (for all kernels except the Gaussian kernel which had a previous upper bound of $O((1/\epsilon) \log^d (1/\epsilon))$) and the lower bound is the first known lower bound to depend on $d$ for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Machine Learning

1802.01751

Country: North America > United States > Utah (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Coresets for Kernel Density Estimates

Phillips, Jeff M., Tai, Wai Ming

arXiv.org Machine LearningOct-11-2017

We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic kernels (including Gaussian and Laplace kernels), our approximation preserves the $L_\infty$ error between kernel density estimates within error $\epsilon$, with coreset size $2/\epsilon^2$, but no other aspects of the data, including the dimension, the diameter of the point set, or the bandwidth of the kernel common to other approximations. When the dimension is unrestricted, we show this bound is tight for these kernels as well as a much broader set. This work provides a careful analysis of the iterative Frank-Wolfe algorithm adapted to this context, an algorithm called \emph{kernel herding}. This analysis unites a broad line of work that spans statistics, machine learning, and geometry. When the dimension $d$ is constant, we demonstrate much tighter bounds on the size of the coreset specifically for Gaussian kernels, showing that it is bounded by the size of the coreset for axis-aligned rectangles. Currently the best known constructive bound is $O(\frac{1}{\epsilon} \log^d \frac{1}{\epsilon})$, and non-constructively, this can be improved by $\sqrt{\log \frac{1}{\epsilon}}$. This improves the best constant dimension bounds polynomially for $d \geq 3$.

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Machine Learning

1710.04325

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

The Robustness of Estimator Composition

Tang, Pingfan, Phillips, Jeff M.

Neural Information Processing SystemsDec-31-2016

A composite estimator successively applies two (or more) estimators: on data decomposed into disjoint parts, it applies the first estimator on each part, then the second estimator on the outputs of the first estimator. And so on, if the composition is of more than two estimators. Informally, the breakdown point is the minimum fraction of data points which if significantly modified will also significantly modify the output of the estimator, so it is typically desirable to have a large breakdown point. Our main result shows that, under mild conditions on the individual estimators, the breakdown point of the composite estimator is the product of the breakdown points of the individual estimators. We also demonstrate several scenarios, ranging from regression to statistical testing, where this analysis is easy to apply, useful in understanding worst case robustness, and sheds powerful insights onto the associated data analysis.

artificial intelligence, estimator, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Utah (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

The Robustness of Estimator Composition

Tang, Pingfan, Phillips, Jeff M.

arXiv.org Machine LearningSep-5-2016

We formalize notions of robustness for composite estimators via the notion of a breakdown point. A composite estimator successively applies two (or more) estimators: on data decomposed into disjoint parts, it applies the first estimator on each part, then the second estimator on the outputs of the first estimator. And so on, if the composition is of more than two estimators. Informally, the breakdown point is the minimum fraction of data points which if significantly modified will also significantly modify the output of the estimator, so it is typically desirable to have a large breakdown point. Our main result shows that, under mild conditions on the individual estimators, the breakdown point of the composite estimator is the product of the breakdown points of the individual estimators. We also demonstrate several scenarios, ranging from regression to statistical testing, where this analysis is easy to apply, useful in understanding worst case robustness, and sheds powerful insights onto the associated data analysis.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

1609.01226

Country: North America > United States > Utah (0.14)

Genre: Research Report (0.84)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Streaming Kernel Principal Component Analysis

Ghashami, Mina, Perry, Daniel, Phillips, Jeff M.

arXiv.org Machine LearningDec-16-2015

Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning. To allow for non-linear relations, typically a full $n \times n$ kernel matrix is constructed over $n$ data points, but this requires too much space and time for large values of $n$. Techniques such as the Nystr\"om method and random feature maps can help towards this goal, but they do not explicitly maintain the basis vectors in a stream and take more space than desired. We propose a new approach for streaming KPCA which maintains a small set of basis elements in a stream, requiring space only logarithmic in $n$, and also improves the dependence on the error parameter. Our technique combines together random feature maps with recent advances in matrix sketching, it has guaranteed spectral norm error bounds with respect to the original kernel matrix, and it compares favorably in practice to state-of-the-art approaches.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1512.05059

Country: North America > United States (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

Efficient Protocols for Distributed Classification and Optimization

Daume, Hal III, Phillips, Jeff M., Saha, Avishek, Venkatasubramanian, Suresh

arXiv.org Machine LearningApr-16-2012

In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication required for learning classifiers while allowing for $\eps$ training error on linearly separable data adversarially distributed across nodes. In this work, we develop key improvements and extensions to this basic model. Our first result is a two-party multiplicative-weight-update based protocol that uses $O(d^2 \log{1/\eps})$ words of communication to classify distributed data in arbitrary dimension $d$, $\eps$-optimally. This readily extends to classification over $k$ nodes with $O(kd^2 \log{1/\eps})$ words of communication. Our proposed protocol is simple to implement and is considerably more efficient than baselines compared, as demonstrated by our empirical results. In addition, we illustrate general algorithm design paradigms for doing efficient learning over distributed data. We show how to solve fixed-dimensional and high dimensional linear programming efficiently in a distributed setting where constraints may be distributed across nodes. Since many learning problems can be viewed as convex optimization problems where constraints are generated by individual points, this models many typical distributed learning scenarios. Our techniques make use of a novel connection from multipass streaming, as well as adapting the multiplicative-weight-update framework more generally to a distributed setting. As a consequence, our methods extend to the wide range of problems solvable using these techniques.

artificial intelligence, communication, machine learning, (19 more...)

arXiv.org Machine Learning

1204.3523

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Protocols for Learning Classifiers on Distributed Data

Daume, Hal III, Phillips, Jeff M., Saha, Avishek, Venkatasubramanian, Suresh

arXiv.org Machine LearningFeb-27-2012

We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets while minimizing the communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets. We present several very general sampling-based solutions as well as some two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for noiseless data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning the important data from another node.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Machine Learning

1202.6078

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback