AITopics | Bansal, Arpit

Collaborating Authors

Bansal, Arpit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Just How Flexible are Neural Networks in Practice?

Shwartz-Ziv, Ravid, Goldblum, Micah, Bansal, Arpit, Bruss, C. Bayan, LeCun, Yann, Wilson, Andrew Gordon

arXiv.org Machine LearningJun-17-2024

It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function class, built into an architecture, shapes its loss surface and impacts the minima we find. In this work, we examine the ability of neural networks to fit data in practice. Our findings indicate that: (1) standard optimizers find minima where the model can only fit training sets with significantly fewer samples than it has parameters; (2) convolutional networks are more parameter-efficient than MLPs and ViTs, even on randomly labeled data; (3) while stochastic training is thought to have a regularizing effect, SGD actually finds minima that fit more training data than full-batch gradient descent; (4) the difference in capacity to fit correctly labeled and incorrectly labeled samples can be predictive of generalization; (5) ReLU activation functions result in finding minima that fit more data despite being designed to avoid vanishing and exploding gradients in deep architectures.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Machine Learning

2406.11463

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transformers Can Do Arithmetic with the Right Embeddings

McLeish, Sean, Bansal, Arpit, Stein, Alex, Jain, Neel, Kirchenbauer, John, Bartoldson, Brian R., Kailkhura, Bhavya, Bhatele, Abhinav, Geiping, Jonas, Schwarzschild, Avi, Goldstein, Tom

arXiv.org Artificial IntelligenceMay-27-2024

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.17399

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Souri, Hossein, Bansal, Arpit, Kazemi, Hamid, Fowl, Liam, Saha, Aniruddha, Geiping, Jonas, Wilson, Andrew Gordon, Chellappa, Rama, Goldstein, Tom, Goldblum, Micah

arXiv.org Artificial IntelligenceMar-24-2024

Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .

artificial intelligence, base sample, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.16365

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Transfer Learning with Deep Tabular Models

Levin, Roman, Cherepanova, Valeriia, Schwarzschild, Avi, Bansal, Arpit, Bruss, C. Bayan, Goldstein, Tom, Wilson, Andrew Gordon, Goldblum, Micah

arXiv.org Artificial IntelligenceAug-7-2023

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning, and we present a how-to guide for using upstream data to boost performance with a variety of tabular neural network architectures. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications. Our code is available at https://github.com/LevinRoman/tabular-transfer-learning .

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2206.15306

Country: North America > United States > Massachusetts (0.45)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.92)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Wen, Yuxin, Bansal, Arpit, Kazemi, Hamid, Borgnia, Eitan, Goldblum, Micah, Geiping, Jonas, Goldstein, Tom

arXiv.org Artificial IntelligenceJun-1-2023

As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings. Membership inference algorithms are designed to determine whether a target data point was present in the training set of a model. Membership inference is often studied in the context of ML privacy, as there are situations where belonging to a dataset is itself sensitive information (e.g. a model trained on a group of people with a rare disease).

artificial intelligence, machine learning, shadow model, (17 more...)

arXiv.org Artificial Intelligence

2210.1075

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Universal Guidance for Diffusion Models

Bansal, Arpit, Chu, Hong-Min, Schwarzschild, Avi, Sengupta, Soumyadip, Goldblum, Micah, Geiping, Jonas, Goldstein, Tom

arXiv.org Artificial IntelligenceFeb-14-2023

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals.

artificial intelligence, guidance, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.07121

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Certified Neural Network Watermarks with Randomized Smoothing

Bansal, Arpit, Chiang, Ping-yeh, Curry, Michael, Jain, Rajiv, Wigington, Curtis, Manjunatha, Varun, Dickerson, John P, Goldstein, Tom

arXiv.org Artificial IntelligenceJul-16-2022

Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose a certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods. Our experiments can be reproduced with code at https://github.com/arpitbansal297/Certified_Watermarks

artificial intelligence, machine learning, watermark, (14 more...)

arXiv.org Artificial Intelligence

2207.07972

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Bansal, Arpit, Schwarzschild, Avi, Borgnia, Eitan, Emam, Zeyad, Huang, Furong, Goldblum, Micah, Goldstein, Tom

arXiv.org Artificial IntelligenceFeb-15-2022

Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is logical extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex algorithms that scale up to large/complex problems at test time. Logical extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard logical extrapolation tasks, some requiring over 100K convolutional layers, without overthinking.

artificial intelligence, machine learning, neural network, (4 more...)

arXiv.org Artificial Intelligence

2202.05826

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.53)

Add feedback

Datasets for Studying Generalization from Easy to Hard Examples

Schwarzschild, Avi, Borgnia, Eitan, Gupta, Arjun, Bansal, Arpit, Emam, Zeyad, Huang, Furong, Goldblum, Micah, Goldstein, Tom

arXiv.org Artificial IntelligenceAug-12-2021

In domains like computer vision, single and multi-agent games, and mathematical reasoning, classically trained models perform well on inputs from the same distribution used for training, but often fail to extrapolate their knowledge to more difficult tasks sampled from a different (but related) distribution. The goal of approaches like deep thinking and algorithm learning is to construct systems that achieve this extrapolation. With this in mind, we detail several datasets intended to motivate and facilitate novel research into systems that generalize from easy training data to harder test examples.

artificial intelligence, dataset, evolutionary algorithm, (18 more...)

arXiv.org Artificial Intelligence

2108.06011

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Chess (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

Bansal, Arpit, Goldblum, Micah, Cherepanova, Valeriia, Schwarzschild, Avi, Bruss, C. Bayan, Goldstein, Tom

arXiv.org Artificial IntelligenceJun-17-2021

Class-imbalanced data, in which some classes contain far more samples than others, is ubiquitous in real-world applications. Standard techniques for handling class-imbalance usually work by training on a re-weighted loss or on re-balanced data. Unfortunately, training overparameterized neural networks on such objectives causes rapid memorization of minority class data. To avoid this trap, we harness meta-learning, which uses both an ''outer-loop'' and an ''inner-loop'' loss, each of which may be balanced using different strategies. We evaluate our method, MetaBalance, on image classification, credit-card fraud detection, loan default prediction, and facial recognition tasks with severely imbalanced data, and we find that MetaBalance outperforms a wide array of popular re-sampling strategies.

deep learning, law enforcement, metabalance, (23 more...)

arXiv.org Artificial Intelligence

2106.09643

Country: North America > United States > Maryland (0.15)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Fraud (0.72)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback