AITopics | Schwab, David J.

Collaborating Authors

Schwab, David J.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generalization vs. Specialization under Concept Shift

Nguyen, Alex, Schwab, David J., Ngampruetikorn, Vudtiwat

arXiv.org Machine LearningSep-23-2024

Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift -- a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the high-dimensional limit. Our results reveal nontrivial effects of concept shift on generalization performance, depending on the properties of robust and nonrobust features of the input. We show that test performance can exhibit a nonmonotonic data dependence, even when double descent is absent. Finally, our experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.

artificial intelligence, concept shift, machine learning, (14 more...)

arXiv.org Machine Learning

2409.15582

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.35)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Generalized Information Bottleneck for Gaussian Variables

Ngampruetikorn, Vudtiwat, Schwab, David J.

arXiv.org Artificial IntelligenceMar-30-2023

The information bottleneck (IB) method offers an attractive framework for understanding representation learning, however its applications are often limited by its computational intractability. Analytical characterization of the IB method is not only of practical interest, but it can also lead to new insights into learning phenomena. Here we consider a generalized IB problem, in which the mutual information in the original IB method is replaced by correlation measures based on Renyi and Jeffreys divergences. We derive an exact analytical IB solution for the case of Gaussian correlated variables. Our analysis reveals a series of structural transitions, similar to those previously observed in the original IB case. We find further that although solving the original, Renyi and Jeffreys IB problems yields different representations in general, the structural transitions occur at the same critical tradeoff parameters, and the Renyi and Jeffreys IB solutions perform well under the original IB objective. Our results suggest that formulating the IB method with alternative correlation measures could offer a strategy for obtaining an approximate solution to the original IB problem.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.17762

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

Add feedback

Leveraging background augmentations to encourage semantic focus in self-supervised contrastive learning

Ryali, Chaitanya K., Schwab, David J., Morcos, Ari S.

arXiv.org Artificial IntelligenceMar-23-2021

Unsupervised representation learning is an important challenge in computer vision, with self-supervised learning methods recently closing the gap to supervised representation learning. An important ingredient in high-performing self-supervised methods is the use of data augmentation by training models to place different augmented views of the same image nearby in embedding space. However, commonly used augmentation pipelines treat images holistically, disregarding the semantic relevance of parts of an image--e.g. a subject vs. a background--which can lead to the learning of spurious correlations. Our work addresses this problem by investigating a class of simple, yet highly effective "background augmentations", which encourage models to focus on semantically-relevant content by discouraging them from focusing on image backgrounds. Background augmentations lead to substantial improvements ( 1-2% on ImageNet-1k) in performance across a spectrum of state-of-the art self-supervised methods (MoCov2, BYOL, SwAV) on a variety of tasks, allowing us to reach within 0.3% of supervised performance. We also demonstrate that background augmentations improve robustness to a number of out of distribution settings, including natural adversarial examples, the backgrounds challenge, adversarial attacks, and ReaL ImageNet.

augmentation, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2103.12719

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Optimal Representations with the Decodable Information Bottleneck

Dubois, Yann, Kiela, Douwe, Schwab, David J., Vedantam, Ramakrishna

arXiv.org Machine LearningSep-27-2020

We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.

deep learning, neural network, representation, (20 more...)

arXiv.org Machine Learning

2009.12789

Country:

Europe (1.00)
North America > United States > California (0.92)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.28)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Add feedback

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Frankle, Jonathan, Schwab, David J., Morcos, Ari S.

arXiv.org Artificial IntelligenceFeb-28-2020

Batch normalization (BatchNorm) has become an indispensable tool for training deep neural networks, yet it is still poorly understood. Although previous work has typically focused on its normalization component, BatchNorm also adds two per-feature trainable parameters: a coefficient and a bias. However, the role and expressive power of these parameters remains unclear. To study this question, we investigate the performance achieved when training only these parameters and freezing all others at their random initializations. We find that doing so leads to surprisingly high performance. For example, a sufficiently deep ResNet reaches 83% accuracy on CIFAR-10 in this configuration. Interestingly, BatchNorm achieves this performance in part by naturally learning to disable around a third of the random features without any changes to the training objective. Not only do these results highlight the under-appreciated role of the affine parameters in BatchNorm, but - in a broader sense - they characterize the expressive power of neural networks constructed simply by shifting and rescaling random features.

batchnorm, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2003.00152

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supervised Learning with Tensor Networks

Stoudenmire, Edwin, Schwab, David J.

Neural Information Processing SystemsFeb-14-2020, 16:44:19 GMT

Tensor networks are approximations of high-order tensors which are efficient to work with and have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing tensor networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize non-linear kernel learning models. For the MNIST data set we obtain less than 1% test set classification error. We discuss an interpretation of the additional structure imparted by the tensor network to the learned model. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, inductive learning, tensor network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)

Add feedback

Learning to Share and Hide Intentions using Information Regularization

Strouse, DJ, Kleiman-Weiner, Max, Tenenbaum, Josh, Botvinick, Matt, Schwab, David J.

Neural Information Processing SystemsDec-31-2018

Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agent to reveal or hide their intentions using an information-theoretic regularizer. We consider both the mutual information between goal and action given state, as well as the mutual information between goal and state. We show how to stochastically optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning. Finally, we demonstrate that cooperative (competitive) policies learned with our approach lead to more (less) reward for a second agent in two simple asymmetric information games.

information, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Share and Hide Intentions using Information Regularization

Strouse, DJ, Kleiman-Weiner, Max, Tenenbaum, Josh, Botvinick, Matt, Schwab, David J.

Neural Information Processing SystemsDec-31-2018

artificial intelligence, information, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A high-bias, low-variance introduction to Machine Learning for physicists

Mehta, Pankaj, Bukov, Marin, Wang, Ching-Hao, Day, Alexandre G. R., Richardson, Clint, Fisher, Charles K., Schwab, David J.

arXiv.org Machine LearningMar-23-2018

Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, and generalization before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists maybe able to contribute. (Notebooks are available at https://physics.bu.edu/~pankajm/MLnotebooks.html )

deep learning, least squares linear regression, upstream oil & gas, (29 more...)

arXiv.org Machine Learning

1803.08823

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
North America > United States > New York > New York County > New York City (0.13)
North America > United States > California > Alameda County > Berkeley (0.13)

Genre:

Workflow (1.00)
Summary/Review (1.00)
Research Report (1.00)
(2 more...)

Industry:

Education > Educational Setting (0.92)
Education > Curriculum (0.67)
Information Technology (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(5 more...)

Add feedback

Supervised Learning with Quantum-Inspired Tensor Networks

Stoudenmire, E. Miles, Schwab, David J.

arXiv.org Machine LearningMay-18-2017

Tensor networks are efficient representations of high-dimensional tensors which have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing such networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize models for classifying images. For the MNIST data set we obtain less than 1% test set classification error. We discuss how the tensor network form imparts additional structure to the learned model and suggest a possible generative interpretation.

inductive learning, neural network, tensor, (18 more...)

arXiv.org Machine Learning

1605.05775

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback