AITopics | Wasserman, Larry

Collaborating Authors

Wasserman, Larry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

Ramdas, Aaditya, Reddi, Sashank J., Poczos, Barnabas, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningAug-4-2015

Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common settings as mean difference alternatives (MDA), for testing differences only in first moments, and general difference alternatives (GDA), which is about testing for any difference in distributions. A large number of test statistics have been proposed for both these settings. This paper connects three classes of statistics - high dimensional variants of Hotelling's t-test, statistics based on Reproducing Kernel Hilbert Spaces, and energy statistics based on pairwise distances. We ask the question: how much statistical power do popular kernel and distance based tests for GDA have when the unknown distributions differ in their means, compared to specialized tests for MDA? We formally characterize the power of popular tests for GDA like the Maximum Mean Discrepancy with the Gaussian kernel (gMMD) and bandwidth-dependent variants of the Energy Distance with the Euclidean norm (eED) in the high-dimensional MDA regime. Some practically important properties include (a) eED and gMMD have asymptotically equal power; furthermore they enjoy a free lunch because, while they are additionally consistent for GDA, they also have the same power as specialized high-dimensional t-test variants for MDA. All these tests are asymptotically optimal (including matching constants) under MDA for spherical covariances, according to simple lower bounds, (b) The power of gMMD is independent of the kernel bandwidth, as long as it is larger than the choice made by the median heuristic, (c) There is a clear and smooth computation-statistics tradeoff for linear-time, subquadratic-time and quadratic-time versions of these tests, with more computation resulting in higher power.

artificial intelligence, machine learning, tr 2, (16 more...)

arXiv.org Machine Learning

1508.00655

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations

Kandasamy, Kirthevasan, Krishnamurthy, Akshay, Poczos, Barnabas, Wasserman, Larry, Robins, James M.

arXiv.org Artificial IntelligenceJun-19-2015

We propose and analyze estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are based on the theory of influence functions, which appear in the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular information theoretic quantities, and via empirical evaluation, show the advantage of this approach over existing estimators.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1411.4342

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Optimal Ridge Detection using Coverage Risk

Chen, Yen-Chi, Genovese, Christopher R., Ho, Shirley, Wasserman, Larry

arXiv.org Machine LearningJun-7-2015

We introduce the concept of coverage risk as an error measure for density ridge estimation. The coverage risk generalizes the mean integrated square error to set estimation. We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk. We study the rate of convergence for coverage risk and prove consistency of the risk estimators. We apply our method to three simulated datasets and to cosmology data. In all the examples, the proposed method successfully recover the underlying density structure.

artificial intelligence, coverage risk, machine learning, (16 more...)

arXiv.org Machine Learning

1506.02278

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Analysis of Active Learning With Uniform Feature Noise

Ramdas, Aaditya, Poczos, Barnabas, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningMay-15-2015

In active learning, the user sequentially chooses values for feature $X$ and an oracle returns the corresponding label $Y$. In this paper, we consider the effect of feature noise in active learning, which could arise either because $X$ itself is being measured, or it is corrupted in transmission to the oracle, or the oracle returns the label of a noisy version of the query point. In statistics, feature noise is known as "errors in variables" and has been studied extensively in non-active settings. However, the effect of feature noise in active learning has not been studied before. We consider the well-known Berkson errors-in-variables model with additive uniform noise of width $\sigma$. Our simple but revealing setting is that of one-dimensional binary classification setting where the goal is to learn a threshold (point where the probability of a $+$ label crosses half). We deal with regression functions that are antisymmetric in a region of size $\sigma$ around the threshold and also satisfy Tsybakov's margin condition around the threshold. We prove minimax lower and upper bounds which demonstrate that when $\sigma$ is smaller than the minimiax active/passive noiseless error derived in \cite{CN07}, then noise has no effect on the rates and one achieves the same noiseless rates. For larger $\sigma$, the \textit{unflattening} of the regression function on convolution with uniform noise, along with its local antisymmetry around the threshold, together yield a behaviour where noise \textit{appears} to be beneficial. Our key result is that active learning can buy significant improvement over a passive strategy even in the presence of feature noise.

artificial intelligence, machine learning, noise, (17 more...)

arXiv.org Machine Learning

1505.04215

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback

Risk Bounds For Mode Clustering

Azizyan, Martin, Chen, Yen-Chi, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningMay-3-2015

Density mode clustering is a nonparametric clustering method. The clusters are the basins of attraction of the modes of a density estimator. We study the risk of mode-based clustering. We show that the clustering risk over the cluster cores --- the regions where the density is high --- is very small even in high dimensions. And under a low noise condition, the overall cluster risk is small even beyond the cores, in high dimensions.

artificial intelligence, critical point, machine learning, (16 more...)

arXiv.org Machine Learning

1505.00482

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions

Ramdas, Aaditya (Carnegie Mellon University) | Reddi, Sashank Jakkam (Carnegie Mellon University) | Poczos, Barnabas (Carnegie Mellon University) | Singh, Aarti (Carnegie Mellon University) | Wasserman, Larry (Carnegie Mellon University)

AAAI ConferencesMar-6-2015

This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Specifically, we differentiate the hardness of estimation of test statistics from the hardness of testing whether these statistics are zero or not, and explicitly discuss a notion of "fair" alternative hypotheses for these problems as dimension increases. We then demonstrate that the power of these tests actually drops polynomially with increasing dimension against fair alternatives. We end with some theoretical insights and shed light on the median heuristic for kernel bandwidth selection. Our work advances the current understanding of the power of modern nonparametric hypothesis tests in high dimensions.

artificial intelligence, dimension, machine learning, (17 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

Ramdas, Aaditya, Reddi, Sashank J., Poczos, Barnabas, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningNov-23-2014

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textit{general} alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textit{mean-shift} alternatives). The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test's power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and also the first analysis of how tests designed for general alternatives perform when faced with easier ones.

artificial intelligence, machine learning, mmd 2, (16 more...)

arXiv.org Machine Learning

1411.6314

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions

Reddi, Sashank J., Ramdas, Aaditya, Póczos, Barnabás, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningNov-23-2014

This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Specifically, we differentiate the hardness of estimation of test statistics from the hardness of testing whether these statistics are zero or not, and explicitly discuss a notion of "fair" alternative hypotheses for these problems as dimension increases. We then demonstrate that the power of these tests actually drops polynomially with increasing dimension against fair alternatives. We end with some theoretical insights and shed light on the \textit{median heuristic} for kernel bandwidth selection. Our work advances the current understanding of the power of modern nonparametric hypothesis tests in high dimensions.

artificial intelligence, dimension, machine learning, (16 more...)

arXiv.org Machine Learning

1406.2083

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Estimating $L_2^2$ Divergence

Krishnamurthy, Akshay, Kandasamy, Kirthevasan, Poczos, Barnabas, Wasserman, Larry

arXiv.org Machine LearningOct-30-2014

We give a comprehensive theoretical characterization of a nonparametric estimator for the $L_2^2$ divergence between two continuous distributions. We first bound the rate of convergence of our estimator, showing that it is $\sqrt{n}$-consistent provided the densities are sufficiently smooth. In this smooth regime, we then show that our estimator is asymptotically normal, construct asymptotic confidence intervals, and establish a Berry-Ess\'{e}en style inequality characterizing the rate of convergence to normality. We also show that this estimator is minimax optimal.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1410.8372

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Nonparametric ridge estimation

Genovese, Christopher R., Perone-Pacifico, Marco, Verdinelli, Isabella, Wasserman, Larry

arXiv.org Machine LearningAug-28-2014

We study the problem of estimating the ridges of a density function. Ridge estimation is an extension of mode finding and is useful for understanding the structure of a density. It can also be used to find hidden structure in point cloud data. We show that, under mild regularity conditions, the ridges of the kernel density estimator consistently estimate the ridges of the true density. When the data are noisy measurements of a manifold, we show that the ridges are close and topologically similar to the hidden manifold. To find the estimated ridges in practice, we adapt the modified mean-shift algorithm proposed by Ozertem and Erdogmus [J. Mach. Learn. Res. 12 (2011) 1249-1286]. Some numerical experiments verify that the algorithm is accurate.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1214/14-AOS1218

1212.5156

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback