AITopics

2410.22695

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceOct-28-2024

ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning

Hwang, Jaedong, Cheung, Brian, Hong, Zhang-Wei, Boopathy, Akhilan, Agrawal, Pulkit, Fiete, Ila

Highly performant large-scale pre-trained models promise to also provide a valuable foundation for learning specialized tasks, by fine-tuning the model to the desired task. By starting from a good general-purpose model, the goal is to achieve both specialization in the target task and maintain robustness. To assess the robustness of models to out-of-distribution samples after fine-tuning on downstream datasets, we introduce a new robust fine-tuning benchmark, ImageNet-RIB (Robustness Inheritance Benchmark). The benchmark consists of a set of related but distinct specialized (downstream) tasks; pre-trained models are fine-tuned on one task in the set and their robustness is assessed on the rest, iterating across all tasks for fine-tuning and assessment. We find that the continual learning methods, EWC and LwF maintain robustness after fine-tuning though fine-tuning generally does reduce performance on generalization to related downstream tasks across models. Not surprisingly, models pre-trained on large and rich datasets exhibit higher initial robustness across datasets and suffer more pronounced degradation during fine-tuning. The distance between the pre-training and downstream datasets, measured by optimal transport, predicts this performance degradation on the pre-training dataset. However, counterintuitively, model robustness after fine-tuning on related downstream tasks is the worst when the pre-training dataset is the richest and the most diverse. This suggests that starting with the strongest foundation model is not necessarily the best approach for performance on specialist tasks. The benchmark thus offers key insights for developing more resilient fine-tuning strategies and building robust machine learning models. https://jd730.github.io/projects/ImageNet-RIB

artificial intelligence, dataset, machine learning, (17 more...)

2410.21582

Country: North America > United States > Massachusetts (0.13)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJun-22-2024

Towards Exact Computation of Inductive Bias

Boopathy, Akhilan, Yue, William, Hwang, Jaedong, Iyer, Abhiram, Fiete, Ila

Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

artificial intelligence, inductive bias, machine learning, (17 more...)

2406.15941

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningApr-21-2024

Resampling-free Particle Filters in High-dimensions

Boopathy, Akhilan, Muppidi, Aneesh, Yang, Peggy, Iyer, Abhiram, Yue, William, Fiete, Ila

State estimation is crucial for the performance and safety of numerous robotic applications. Among the suite of estimation techniques, particle filters have been identified as a powerful solution due to their non-parametric nature. Yet, in high-dimensional state spaces, these filters face challenges such as 'particle deprivation' which hinders accurate representation of the true posterior distribution. This paper introduces a novel resampling-free particle filter designed to mitigate particle deprivation by forgoing the traditional resampling step. This ensures a broader and more diverse particle set, especially vital in high-dimensional scenarios. Theoretically, our proposed filter is shown to offer a near-accurate representation of the desired posterior distribution in high-dimensional contexts. Empirically, the effectiveness of our approach is underscored through a high-dimensional synthetic state estimation task and a 6D pose estimation derived from videos. We posit that as robotic systems evolve with greater degrees of freedom, particle filters tailored for high-dimensional state spaces will be indispensable.

artificial intelligence, machine learning, particle filter, (15 more...)

2404.13698

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceOct-26-2023

Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity

Hwang, Jaedong, Hong, Zhang-Wei, Chen, Eric, Boopathy, Akhilan, Agrawal, Pulkit, Fiete, Ila

Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards. To address this, intrinsic rewards can be generated using forward model prediction errors that decrease as the environment becomes known, and incentivize an agent to explore novel states. While prediction-based intrinsic rewards can help agents solve hard exploration tasks, they can suffer from catastrophic forgetting and actually increase at visited states. We first examine the conditions and causes of catastrophic forgetting in grid world environments. We then propose a new method FARCuriosity, inspired by how humans and animals learn. The method depends on fragmentation and recall: an agent fragments an environment based on surprisal, and uses different local curiosity modules (prediction-based intrinsic reward functions) for each fragment so that modules are not trained on the entire environment. At each fragmentation event, the agent stores the current module in long-term memory (LTM) and either initializes a new module or recalls a previously stored module based on its match with the current state. With fragmentation and recall, FARCuriosity achieves less forgetting and better overall performance in games with varied and heterogeneous environments in the Atari benchmark suite of tasks. Thus, this work highlights the problem of catastrophic forgetting in prediction-based curiosity methods and proposes a solution.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

2310.17537

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

arXiv.org Artificial IntelligenceOct-16-2023

Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

Hwang, Jaedong, Hong, Zhang-Wei, Chen, Eric, Boopathy, Akhilan, Agrawal, Pulkit, Fiete, Ila

Animals and robots navigate through environments by building and refining maps of space. These maps enable functions including navigation back to home, planning, search and foraging. Here, we use observations from neuroscience, specifically the observed fragmentation of grid cell map in compartmentalized spaces, to propose and apply the concept of Fragmentation-and-Recall (FARMap) in the mapping of large spaces. Agents solve the mapping problem by building local maps via a surprisal-based clustering of space, which they use to set subgoals for spatial exploration. Agents build and use a local map to predict their observations; high surprisal leads to a "fragmentation event" that truncates the local map. At these events, the recent local map is placed into long-term memory (LTM) and a different local map is initialized. If observations at a fracture point match observations in one of the stored local maps, that map is recalled (and thus reused) from LTM. The fragmentation points induce a natural online clustering of the larger space, forming a set of intrinsic potential subgoals that are stored in LTM as a topological graph. Agents choose their next subgoal from the set of near and far potential subgoals from within the current local map or LTM, respectively. Thus, local maps guide exploration locally, while LTM promotes global exploration. We evaluate FARMap on complex procedurally-generated spatial environments and realistic simulations to demonstrate that this mapping strategy much more rapidly covers the environment (number of agent steps and wall clock time) and is more efficient in active memory usage, without loss of performance.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2307.05793

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(3 more...)

arXiv.org Artificial IntelligenceJun-2-2023

Model-agnostic Measure of Generalization Difficulty

Boopathy, Akhilan, Liu, Kevin, Hwang, Jaedong, Ge, Shu, Mohammedsaleh, Asaad, Fiete, Ila

The measure of a machine learning algorithm is the difficulty of the tasks it can perform, and sufficiently difficult tasks are critical drivers of strong machine learning models. However, quantifying the generalization difficulty of machine learning benchmarks has remained challenging. We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty of tasks. Our inductive bias complexity measure quantifies the total information required to generalize well on a task minus the information provided by the data. It does so by measuring the fractional volume occupied by hypotheses that generalize on a task given that they fit the training data. It scales exponentially with the intrinsic dimensionality of the space over which the model must generalize but only polynomially in resolution per dimension, showing that tasks which require generalizing over many dimensions are drastically more difficult than tasks involving more detail in fewer dimensions. Our measure can be applied to compute and compare supervised learning, reinforcement learning and meta-learning generalization difficulties against each other. We show that applied empirically, it formally quantifies intuitively expected trends, e.g. that in terms of required inductive bias, MNIST < CIFAR10 < Imagenet and fully observable Markov decision processes (MDPs) < partially observable MDPs. Further, we show that classification of complex images < few-shot meta-learning with simple images. Our measure provides a quantitative metric to guide the construction of more complex tasks requiring greater inductive bias, and thereby encourages the development of more sophisticated architectures and learning algorithms with more powerful generalization capabilities.

artificial intelligence, inductive learning, machine learning, (17 more...)

2305.01034

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)

arXiv.org Artificial IntelligenceMar-24-2023

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Schaeffer, Rylan, Khona, Mikail, Robertson, Zachary, Boopathy, Akhilan, Pistunova, Kateryna, Rocks, Jason W., Fiete, Ila Rani, Koyejo, Oluwasanmi

Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.

artificial intelligence, deep learning, machine learning, (17 more...)

2303.14151

Country: North America > United States (0.47)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

arXiv.org Machine LearningJun-15-2021

Gradient-trained Weights in Wide Neural Networks Align Layerwise to Error-scaled Input Correlations

Boopathy, Akhilan, Fiete, Ila

Recent works have examined how deep neural networks, which can solve a variety of difficult problems, incorporate the statistics of training data to achieve their success. However, existing results have been established only in limited settings. In this work, we derive the layerwise weight dynamics of infinite-width neural networks with nonlinear activations trained by gradient descent. We show theoretically that weight updates are aligned with input correlations from intermediate layers weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate backpropagation-free learning rules, named Align-zero and Align-ada, that theoretically achieve the same alignment as backpropagation. Finally, we test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation.

artificial intelligence, deep learning, neural network, (19 more...)

2106.08453

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningFeb-1-2021

Fast Training of Provably Robust Neural Networks by SingleProp

Boopathy, Akhilan, Weng, Tsui-Wei, Liu, Sijia, Chen, Pin-Yu, Zhang, Gaoyuan, Daniel, Luca

Recent works have developed several methods of defending neural networks against adversarial attacks with certified guarantees. However, these techniques can be computationally costly due to the use of certification during training. We develop a new regularizer that is both more efficient than existing certified defenses, requiring only one additional forward propagation through a network, and can be used to train networks with similar certified accuracy. Through experiments on MNIST and CIFAR-10 we demonstrate improvements in training speed and comparable certified accuracy compared to state-of-the-art certified defenses.

deep learning, ibp, neural network, (17 more...)

2102.01208

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)