AITopics | Wilson, Andrew Gordon

Collaborating Authors

Wilson, Andrew Gordon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Goldblum, Micah, Souri, Hossein, Ni, Renkun, Shu, Manli, Prabhu, Viraj, Somepalli, Gowthami, Chattopadhyay, Prithvijit, Ibrahim, Mark, Bardes, Adrien, Hoffman, Judy, Chellappa, Rama, Wilson, Andrew Gordon, Goldstein, Tom

arXiv.org Artificial IntelligenceNov-19-2023

Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets.

artificial intelligence, backbone, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.19909

Country:

North America > United States (0.67)
Asia > Middle East > Israel (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.67)
Information Technology (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Cherepanova, Valeriia, Levin, Roman, Somepalli, Gowthami, Geiping, Jonas, Bruss, C. Bayan, Wilson, Andrew Gordon, Goldstein, Tom, Goldblum, Micah

arXiv.org Artificial IntelligenceNov-10-2023

Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.

artificial intelligence, machine learning, mlp 0, (14 more...)

arXiv.org Artificial Intelligence

2311.05877

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Few-shot Classification with Instruction-Finetuned Language Models

Aly, Rami, Shi, Xingjian, Lin, Kaixiang, Zhang, Aston, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceOct-21-2023

A particularly successful class of approaches for few-shot learning combines language models with prompts -- hand-crafted task descriptions that complement data samples. However, designing prompts by hand for each task commonly requires domain knowledge and substantial guesswork. We observe, in the context of classification tasks, that instruction finetuned language models exhibit remarkable prompt robustness, and we subsequently propose a simple method to eliminate the need for handcrafted prompts, named AuT-Few. This approach consists of (i) a prompt retrieval module that selects suitable task instructions from the instruction-tuning knowledge base, and (ii) the generation of two distinct, semantically meaningful, class descriptions and a selection mechanism via cross-validation. Over $12$ datasets, spanning $8$ classification tasks, we show that AuT-Few outperforms current state-of-the-art few-shot learning methods. Moreover, AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark. Notably, these results are achieved without task-specific handcrafted prompts on unseen tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.12576

Country:

Europe (1.00)
Asia (0.92)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
(2 more...)

Add feedback

Large Language Models Are Zero-Shot Time Series Forecasters

Gruver, Nate, Finzi, Marc, Qiu, Shikai, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceOct-11-2023

By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.0782

Country:

Asia > Middle East > Republic of Türkiye (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

Finzi, Marc, Potapczynski, Andres, Choptuik, Matthew, Wilson, Andrew Gordon

arXiv.org Machine LearningAug-30-2023

Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks. Partial differential equations (PDEs) are needed to describe many phenomena in the natural sciences.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2304.14994

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Transfer Learning with Deep Tabular Models

Levin, Roman, Cherepanova, Valeriia, Schwarzschild, Avi, Bansal, Arpit, Bruss, C. Bayan, Goldstein, Tom, Wilson, Andrew Gordon, Goldblum, Micah

arXiv.org Artificial IntelligenceAug-7-2023

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning, and we present a how-to guide for using upstream data to boost performance with a variety of tabular neural network architectures. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications. Our code is available at https://github.com/LevinRoman/tabular-transfer-learning .

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2206.15306

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.92)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

Kirichenko, Polina, Izmailov, Pavel, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceJun-30-2023

Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions. However, even in these cases, we show that they still often learn core features associated with the desired attributes of the data, contrary to recent findings. Inspired by this insight, we demonstrate that simple last layer retraining can match or outperform state-of-the-art approaches on spurious correlation benchmarks, but with profoundly lower complexity and computational expenses. Moreover, we show that last layer retraining on large ImageNet-trained models can also significantly reduce reliance on background and texture information, improving robustness to covariate shift, after only minutes of training on a single GPU.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2204.02937

Country: North America > United States (0.14)

Genre:

Research Report > Promising Solution (0.47)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

A Cookbook of Self-Supervised Learning

Balestriero, Randall, Ibrahim, Mark, Sobal, Vlad, Morcos, Ari, Shekhar, Shashank, Goldstein, Tom, Bordes, Florian, Bardes, Adrien, Mialon, Gregoire, Tian, Yuandong, Schwarzschild, Avi, Wilson, Andrew Gordon, Geiping, Jonas, Garrido, Quentin, Fernandez, Pierre, Bar, Amir, Pirsiavash, Hamed, LeCun, Yann, Goldblum, Micah

arXiv.org Artificial IntelligenceJun-28-2023

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2304.1221

Country:

Asia > Middle East > Israel (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.45)
Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.68)

Add feedback

Simple and Fast Group Robustness by Automatic Feature Reweighting

Qiu, Shikai, Potapczynski, Andres, Izmailov, Pavel, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceJun-19-2023

A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target. Standard methods for reducing the reliance on spurious features typically assume that we know what the spurious feature is, which is rarely true in the real world. Methods that attempt to alleviate this limitation are complex, hard to tune, and lead to a significant computational overhead compared to standard training. In this paper, we propose Automatic Feature Reweighting (AFR), an extremely simple and fast method for updating the model to reduce the reliance on spurious features. AFR retrains the last layer of a standard ERM-trained base model with a weighted loss that emphasizes the examples where the ERM model predicts poorly, automatically upweighting the minority group without group labels. With this simple procedure, we improve upon the best reported results among competing methods trained without spurious attributes on several vision and natural language classification benchmarks, using only a fraction of their compute.

afr, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2306.11074

Country:

North America > United States > Hawaii (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

Finzi, Marc, Boral, Anudhyan, Wilson, Andrew Gordon, Sha, Fei, Zepeda-Núñez, Leonardo

arXiv.org Artificial IntelligenceJun-12-2023

Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about outliers and extreme events; however, querying that knowledge through conditional sampling or measuring probabilities is surprisingly difficult. Existing methods for conditional sampling at inference time seek mainly to enforce the constraints, which is insufficient to match the statistics of the distribution or compute the probability of the chosen events. To achieve these ends, optimally one would use the conditional score function, but its computation is typically intractable. In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. With this scheme we are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.

artificial intelligence, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2306.07526

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback