AITopics | Bhatt, Gantavya

Collaborating Authors

Bhatt, Gantavya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Learning

Das, Arnav M., Bhatt, Gantavya, Kumari, Lilly, Verma, Sahil, Bilmes, Jeff

arXiv.org Artificial IntelligenceDec-23-2024

Retrieval augmentation, the practice of retrieving additional data from large auxiliary pools, has emerged as an effective technique for enhancing model performance in the low-data regime, e.g. few-shot learning. Prior approaches have employed only nearest-neighbor based strategies for data selection, which retrieve auxiliary samples with high similarity to instances in the target task. However, these approaches are prone to selecting highly redundant samples, since they fail to incorporate any notion of diversity. In our work, we first demonstrate that data selection strategies used in prior retrieval-augmented few-shot learning settings can be generalized using a class of functions known as Combinatorial Mutual Information (CMI) measures. We then propose COBRA (COmBinatorial Retrieval Augmentation), which employs an alternative CMI measure that considers both diversity and similarity to a target dataset. COBRA consistently outperforms previous retrieval approaches across image classification tasks and few-shot learning techniques when used to retrieve samples from LAION-2B. COBRA introduces negligible computational overhead to the cost of retrieval while providing significant gains in downstream model performance.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.17684

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Oregon (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

Bansal, Hritik, Suvarna, Ashima, Bhatt, Gantavya, Peng, Nanyun, Chang, Kai-Wei, Grover, Aditya

arXiv.org Artificial IntelligenceMar-30-2024

A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context. This only leverages the pairwise comparisons when the generations are placed in an identical context. However, such conditional rankings often fail to capture the complex and multidimensional aspects of human preferences. In this work, we revisit the traditional paradigm of preference acquisition and propose a new axis that is based on eliciting preferences jointly over the instruction-response pairs. While prior preference optimizations are designed for conditional ranking protocols (e.g., DPO), our proposed preference acquisition protocol introduces DOVE, a new preference optimization objective that upweights the joint probability of the chosen instruction-response pair over the rejected instruction-response pair. Interestingly, we find that the LLM trained with joint instruction-response preference data using DOVE outperforms the LLM trained with DPO by 5.2% and 3.3% win-rate for the summarization and open-ended dialogue datasets, respectively. Our findings reveal that joint preferences over instruction and response pairs can significantly enhance the alignment of LLMs by tapping into a broader spectrum of human preference elicitation. The data and code is available at https://github.com/Hritikbansal/dove.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.0053

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Deep Submodular Peripteral Networks

Bhatt, Gantavya, Das, Arnav, Bilmes, Jeff

arXiv.org Artificial IntelligenceMar-15-2024

Submodular functions, crucial for various applications, often lack practical learning methods for their acquisition. Seemingly unrelated, learning a scaling from oracles offering graded pairwise preferences (GPC) is underexplored, despite a rich history in psychometrics. In this paper, we introduce deep submodular peripteral networks (DSPNs), a novel parametric family of submodular functions, and methods for their training using a contrastive-learning inspired GPC-ready strategy to connect and then tackle both of the above challenges. We introduce newly devised GPC-style "peripteral" loss which leverages numerically graded relationships between pairs of objects (sets in our case). Unlike traditional contrastive learning, our method utilizes graded comparisons, extracting more nuanced information than just binary-outcome comparisons, and contrasts sets of any size (not just two). We also define a novel suite of automatic sampling strategies for training, including active-learning inspired submodular feedback. We demonstrate DSPNs' efficacy in learning submodularity from a costly target submodular function showing superiority in downstream tasks such as experimental design and streaming applications.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2403.08199

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Bhatt, Gantavya, Chen, Yifang, Das, Arnav M., Zhang, Jifan, Truong, Sang T., Mussmann, Stephen, Zhu, Yinglun, Bilmes, Jeffrey, Du, Simon S., Jamieson, Kevin, Ash, Jordan T., Nowak, Robert D.

arXiv.org Artificial IntelligenceJan-12-2024

Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.06692

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Zhang, Jifan, Chen, Yifang, Canal, Gregory, Mussmann, Stephen, Das, Arnav M., Bhatt, Gantavya, Zhu, Yinglun, Bilmes, Jeffrey, Du, Simon Shaolei, Jamieson, Kevin, Nowak, Robert D

arXiv.org Artificial IntelligenceJan-12-2024

Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.0991

Country:

Europe (0.67)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California > Riverside County > Riverside (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Accelerating Batch Active Learning Using Continual Learning Techniques

Das, Arnav, Bhatt, Gantavya, Bhalerao, Megh, Gao, Vianne, Yang, Rui, Bilmes, Jeff

arXiv.org Artificial IntelligenceDec-12-2023

A major problem with Active Learning (AL) is high training costs since models are typically retrained from scratch after every query round. We start by demonstrating that standard AL on neural networks with warm starting fails, both to accelerate training and to avoid catastrophic forgetting when using fine-tuning over AL query rounds. We then develop a new class of techniques, circumventing this problem, by biasing further training towards previously labeled sets. We accomplish this by employing existing, and developing novel, replay-based Continual Learning (CL) algorithms that are effective at quickly learning the new without forgetting the old, especially when data comes from an evolving distribution. We call this paradigm "Continual Active Learning" (CAL). We show CAL achieves significant speedups using a plethora of replay schemes that use model distillation and that select diverse/uncertain points from the history. We conduct experiments across many data domains, including natural language, vision, medical imaging, and computational biology, each with different neural architectures and dataset sizes. CAL consistently provides a 3x reduction in training time, while retaining performance and out-of-distribution robustness, showing its wide applicability.

artificial intelligence, machine learning, survey article, (16 more...)

arXiv.org Artificial Intelligence

2305.06408

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Effective Backdoor Mitigation Depends on the Pre-training Objective

Verma, Sahil, Bhatt, Gantavya, Schwarzschild, Avi, Singhal, Soumye, Das, Arnav Mohanty, Shah, Chirag, Dickerson, John P, Bilmes, Jeff

arXiv.org Artificial IntelligenceDec-5-2023

Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for pre-training multimodal models, as these datasets may harbor backdoors. Various techniques have been proposed to mitigate the effects of backdooring in these models such as CleanCLIP which is the current state-of-the-art approach. In this work, we demonstrate that the efficacy of CleanCLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that stronger pre-training objectives correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using CleanCLIP. We find that CleanCLIP is ineffective when stronger pre-training objectives are used, even with extensive hyperparameter tuning. Our findings underscore critical considerations for ML practitioners who pre-train models using large-scale web-curated data and are concerned about potential backdoor threats. Notably, our results suggest that simpler pre-training objectives are more amenable to effective backdoor removal. This insight is pivotal for practitioners seeking to balance the trade-offs between using stronger pre-training objectives and security against backdoor attacks.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2311.14948

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)
Research Report > Promising Solution (0.89)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

High Resolution Point Clouds from mmWave Radar

Prabhakara, Akarsh, Jin, Tao, Das, Arnav, Bhatt, Gantavya, Kumari, Lilly, Soltanaghaei, Elahe, Bilmes, Jeff, Kumar, Swarun, Rowe, Anthony

arXiv.org Artificial IntelligenceJul-16-2023

This paper explores a machine learning approach for generating high resolution point clouds from a single-chip mmWave radar. Unlike lidar and vision-based systems, mmWave radar can operate in harsh environments and see through occlusions like smoke, fog, and dust. Unfortunately, current mmWave processing techniques offer poor spatial resolution compared to lidar point clouds. This paper presents RadarHD, an end-to-end neural network that constructs lidar-like point clouds from low resolution radar input. Enhancing radar images is challenging due to the presence of specular and spurious reflections. Radar data also doesn't map well to traditional image processing techniques due to the signal's sinc-like spreading pattern. We overcome these challenges by training RadarHD on a large volume of raw I/Q radar data paired with lidar point clouds across diverse indoor settings. Our experiments show the ability to generate rich point clouds even in scenes unobserved during training and in the presence of heavy smoke occlusion. Further, RadarHD's point clouds are high-quality enough to work with existing lidar odometry and mapping workflows.

artificial intelligence, machine learning, radar, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA48891.2023.10161429

2206.09273

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Illinois > Champaign County (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.93)
Transportation (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Systematic Generalization for Predictive Control in Multivariate Time Series

Bansal, Hritik, Bhatt, Gantavya, Malhotra, Pankaj, P, Prathosh A.

arXiv.org Artificial IntelligenceFeb-10-2021

Prior work has focused on evaluating the ability of neural networks to reason about novel combinations from known components, an intrinsic property of human cognition. In this work, we aim to study systematic generalization in predicting future state trajectories of a dynamical system, conditioned on past states' trajectory (dependent variables), past and future actions (control variables). In our context, systematic generalization implies that a good model should perform well on all new combinations of future actions after being trained on all of them, but only on a limited set of their combinations. For models to generalize out-of-distribution to unseen action combinations, they should reason about the states and their dependency relation with the applied actions. We conduct a rigorous study of useful inductive biases that learn to predict the trajectories up to large horizons well, and capture true dependency relations between the states and the controls through our synthetic setup, and simulated data from electric motors.

deep learning, transition model, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

2102.05602

Country: Asia > India > NCT (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.40)

Technology:

Information Technology > Modeling & Simulation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback