AITopics | Pleiss, Geoff

Collaborating Authors

Pleiss, Geoff

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

Gordon-Rodriguez, Elliott, Loaiza-Ganem, Gabriel, Pleiss, Geoff, Cunningham, John P.

arXiv.org Machine LearningNov-10-2020

Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof.

cross-entropy loss, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2011.05231

Country: North America > Canada (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization

Pleiss, Geoff, Jankowiak, Martin, Eriksson, David, Damle, Anil, Gardner, Jacob R.

arXiv.org Machine LearningJun-19-2020

Matrix square roots and their inverses arise frequently in machine learning, e.g., when sampling from high-dimensional Gaussians $\mathcal{N}(\mathbf 0, \mathbf K)$ or whitening a vector $\mathbf b$ against covariance matrix $\mathbf K$. While existing methods typically require $O(N^3)$ computation, we introduce a highly-efficient quadratic-time algorithm for computing $\mathbf K^{1/2} \mathbf b$, $\mathbf K^{-1/2} \mathbf b$, and their derivatives through matrix-vector multiplication (MVMs). Our method combines Krylov subspace methods with a rational approximation and typically achieves $4$ decimal places of accuracy with fewer than $100$ MVMs. Moreover, the backward pass requires little additional computation. We demonstrate our method's applicability on matrices as large as $50,\!000 \times 50,\!000$ - well beyond traditional methods - with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy.

artificial intelligence, bayesian inference, matrix, (19 more...)

arXiv.org Machine Learning

2006.11267

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Convolutional Networks with Dense Connectivity

Huang, Gao, Liu, Zhuang, Pleiss, Geoff, van der Maaten, Laurens, Weinberger, Kilian Q.

arXiv.org Machine LearningJan-8-2020

IEEE TRANSACTIONS ON P A TTERN ANAL YSIS AND MACHINE INTELLIGENCE 1 Convolutional Networks with Dense Connectivity Gao Huang, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten and Kilian Q. Weinberger Abstract--Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections--one between each layer and its subsequent layer--our network has L(L 1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency . We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. I NTRODUCTION C ONVOLUTIONALneural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago [1], improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 [2] consisted of 5 layers, VGG featured 19 [3], and thanks to the skip/shortcut connections, Highway Networks [4] and Residual Networks (ResNets) [5] have surpassed the 100-layer barrier. As CNNs become increasingly deep, a new research problem emerges: information about the input or gradient that passes through many layers it can vanish and "wash out" by the time it reaches the end (or beginning) of the network. Many recent publications address this problem. For example, Rectified Linear Unites (ReLU) [6] avoid gradient saturation, batch-normalization [7] reduces covariate shift across layers by re-scaling the outputs of its previous layer. ResNets [5] and Highway Networks [4] bypass signal from one layer to the next via identity connections. Stochastic depth [8] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow.

deep learning, densenet, neural network, (22 more...)

arXiv.org Machine Learning

2001.02394

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Gaussian Process Regression Beyond Variational Inference

Jankowiak, Martin, Pleiss, Geoff, Gardner, Jacob R.

arXiv.org Machine LearningOct-15-2019

The combination of inducing point methods with stochastic variational inference has enabled approximate Gaussian Process (GP) inference on large datasets. Unfortunately, the resulting predictive distributions often exhibit substantially underestimated uncertainties. Worse still, in the regression case the predictive variance is typically dominated by observation noise, yielding uncertainty estimates that make little use of the input-dependent function uncertainty that makes GP priors attractive. In this work we propose a simple inference procedure that bypasses posterior approximations and instead directly targets the posterior predictive distribution. In an extensive empirical comparison with a number of alternative inference strategies on univariate and multivariate regression tasks, we find that the resulting predictive distributions exhibit significantly better calibrated uncertainties and higher log likelihoods--often by as much as half a nat or more per datapoint.

deep learning, inference, neural network, (19 more...)

arXiv.org Machine Learning

1910.07123

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exact Gaussian Processes on a Million Data Points

Wang, Ke Alexander, Pleiss, Geoff, Gardner, Jacob R., Tyree, Stephen, Weinberger, Kilian Q., Wilson, Andrew Gordon

arXiv.org Machine LearningMar-19-2019

Gaussian processes (GPs) are flexible models with state-of-the-art performance on many impactful applications. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication. By partitioning and distributing kernel matrix multiplies, we demonstrate that an exact GP can be trained on over a million points in 3 days using 8 GPUs and can compute predictive means and variances in under a second using 1 GPU at test time. Moreover, we perform the first-ever comparison of exact GPs against state-of-the-art scalable approximations on large-scale regression datasets with $10^4-10^6$ data points, showing dramatic performance improvements.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

1903.08114

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration

Gardner, Jacob, Pleiss, Geoff, Weinberger, Kilian Q., Bindel, David, Wilson, Andrew G.

Neural Information Processing SystemsDec-31-2018

Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from O(n^3) to O(n^2). Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration

Gardner, Jacob, Pleiss, Geoff, Weinberger, Kilian Q., Bindel, David, Wilson, Andrew G.

Neural Information Processing SystemsDec-31-2018

deep learning, inference, neural network, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration

Gardner, Jacob R., Pleiss, Geoff, Bindel, David, Weinberger, Kilian Q., Wilson, Andrew Gordon

arXiv.org Machine LearningOct-29-2018

Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from $O(n^3)$ to $O(n^2)$. Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch.

cholesky decomposition, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1809.11165

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Constant-Time Predictive Distributions for Gaussian Processes

Pleiss, Geoff, Gardner, Jacob R., Weinberger, Kilian Q., Wilson, Andrew Gordon

arXiv.org Machine LearningMar-19-2018

One of the most compelling features of Gaussian process (GP) regression is its ability to provide well calibrated posterior distributions. Recent advances in inducing point methods have drastically sped up marginal likelihood and posterior mean computations, leaving posterior covariance estimation and sampling as the remaining computational bottlenecks. In this paper we address this shortcoming by using the Lanczos decomposition algorithm to rapidly approximate the predictive covariance matrix. Our approach, which we refer to as LOVE (LanczOs Variance Estimates), substantially reduces the time and space complexity over any previous method. In practice, it can compute predictive covariances up to 2,000 times faster and draw samples 18,000 time faster than existing methods, all without sacrificing accuracy.

artificial intelligence, machine learning, variance, (16 more...)

arXiv.org Machine Learning

1803.06058

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.88)

Add feedback

Product Kernel Interpolation for Scalable Gaussian Processes

Gardner, Jacob R., Pleiss, Geoff, Wu, Ruihan, Weinberger, Kilian Q., Wilson, Andrew Gordon

arXiv.org Machine LearningFeb-24-2018

Recent work shows that inference for Gaussian processes can be performed efficiently using iterative methods that rely only on matrix-vector multiplications (MVMs). Structured Kernel Interpolation (SKI) exploits these techniques by deriving approximate kernels with very fast MVMs. Unfortunately, such strategies suffer badly from the curse of dimensionality. We develop a new technique for MVM based learning that exploits product kernel structure. We demonstrate that this technique is broadly applicable, resulting in linear rather than exponential runtime with dimension for SKI, as well as state-of-the-art asymptotic complexity for multi-task GPs.

artificial intelligence, health & medicine, kernel, (16 more...)

arXiv.org Machine Learning

1802.08903

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback