AITopics | Bachu, Saketh

Collaborating Authors

Bachu, Saketh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models

Bachu, Saketh, Shayegani, Erfan, Chakraborty, Trishna, Lal, Rohit, Dutta, Arindam, Song, Chengyu, Dong, Yue, Abu-Ghazaleh, Nael, Roy-Chowdhury, Amit K.

arXiv.org Artificial IntelligenceNov-6-2024

Vision-language models (VLMs) have improved significantly in multi-modal tasks, but their more complex architecture makes their safety alignment more challenging than the alignment of large language models (LLMs). In this paper, we reveal an unfair distribution of safety across the layers of VLM's vision encoder, with earlier and middle layers being disproportionately vulnerable to malicious inputs compared to the more robust final layers. This 'cross-layer' vulnerability stems from the model's inability to generalize its safety training from the default architectural settings used during training to unseen or out-of-distribution scenarios, leaving certain layers exposed. We conduct a comprehensive analysis by projecting activations from various intermediate layers and demonstrate that these layers are more likely to generate harmful outputs when exposed to malicious inputs. Our experiments with LLaVA-1.5 and Llama 3.2 show discrepancies in attack success rates and toxicity scores across layers, indicating that current safety alignment strategies focused on a single default layer are insufficient.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.04291

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.69)
Law > Criminal Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Towards Learning and Explaining Indirect Causal Effects in Neural Networks

Reddy, Abbavaram Gowtham, Bachu, Saketh, Pathak, Harsharaj, Godfrey, Benin L, Balasubramanian, Vineeth N., V, Varshaneya, Kar, Satya Narayanan

arXiv.org Artificial IntelligenceJan-8-2024

Recently, there has been a growing interest in learning and explaining causal effects within Neural Network (NN) models. By virtue of NN architectures, previous approaches consider only direct and total causal effects assuming independence among input variables. We view an NN as a structural causal model (SCM) and extend our focus to include indirect causal effects by introducing feedforward connections among input neurons. We propose an ante-hoc method that captures and maintains direct, indirect, and total causal effects during NN model training. We also propose an algorithm for quantifying learned causal effects in an NN model and efficient approximation strategies for quantifying causal effects in high-dimensional data. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the causal effects learned by our ante-hoc method better approximate the ground truth effects compared to existing methods.

artificial intelligence, causal effect, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.1385

Country: Asia > India (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On Counterfactual Data Augmentation Under Confounding

Reddy, Abbavaram Gowtham, Bachu, Saketh, Dash, Saloni, Sharma, Charchit, Sharma, Amit, Balasubramanian, Vineeth N

arXiv.org Machine LearningNov-21-2023

Counterfactual data augmentation has recently emerged as a method to mitigate confounding biases in the training data. These biases, such as spurious correlations, arise due to various observed and unobserved confounding variables in the data generation process. In this paper, we formally analyze how confounding biases impact downstream classifiers and present a causal viewpoint to the solutions based on counterfactual data augmentation. We explore how removing confounding biases serves as a means to learn invariant features, ultimately aiding in generalization beyond the observed data distribution. Additionally, we present a straightforward yet powerful algorithm for generating counterfactual images, which effectively mitigates the influence of confounding effects on downstream classifiers. Through experiments on MNIST variants and the CelebA datasets, we demonstrate how our simple augmentation method helps existing state-of-the-art methods achieve good results.

artificial intelligence, data augmentation, machine learning, (18 more...)

arXiv.org Machine Learning

2305.18183

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Causal Inference Using LLM-Guided Discovery

Vashishtha, Aniket, Reddy, Abbavaram Gowtham, Kumar, Abhinav, Bachu, Saketh, Balasubramanian, Vineeth N, Sharma, Amit

arXiv.org Artificial IntelligenceOct-23-2023

At the core of causal inference lies the challenge of determining reliable causal graphs solely based on observational data. Since the well-known backdoor criterion depends on the graph, any errors in the graph can propagate downstream to effect inference. In this work, we initially show that complete graph information is not necessary for causal effect inference; the topological order over graph variables (causal order) alone suffices. Further, given a node pair, causal order is easier to elicit from domain experts compared to graph edges since determining the existence of an edge can depend extensively on other variables. Interestingly, we find that the same principle holds for Large Language Models (LLMs) such as GPT-3.5-turbo and GPT-4, motivating an automated method to obtain causal order (and hence causal effect) with LLMs acting as virtual domain experts. To this end, we employ different prompting strategies and contextual cues to propose a robust technique of obtaining causal order from LLMs. Acknowledging LLMs' limitations, we also study possible techniques to integrate LLMs with established causal discovery algorithms, including constraint-based and score-based methods, to enhance their performance. Extensive experiments demonstrate that our approach significantly improves causal ordering accuracy as compared to discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.

deep learning, llm-guided discovery, machine learning, (4 more...)

arXiv.org Artificial Intelligence

2310.15117

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach

B, Vimal K, Bachu, Saketh, Garg, Tanmay, Narasimhan, Niveditha Lakshmi, Konuru, Raghavan, Balasubramanian, Vineeth N

arXiv.org Artificial IntelligenceSep-5-2023

Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks in recent years. Existing efforts propose metrics that allow a user to choose one model from a pool of pre-trained models without having to fine-tune each model individually and identify one explicitly. With the growth in the number of available pre-trained models and the popularity of model ensembles, it also becomes essential to study the transferability of multiple-source models for a given target task. The few existing efforts study transferability in such multi-source ensemble settings using just the outputs of the classification layer and neglect possible domain or task mismatch. Moreover, they overlook the most important factor while selecting the source models, viz., the cohesiveness factor between them, which can impact the performance and confidence in the prediction of the ensemble. To address these gaps, we propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task. OSBORN collectively accounts for image domain difference, task difference, and cohesiveness of models in the ensemble to provide reliable estimates of transferability. We gauge the performance of OSBORN on both image classification and semantic segmentation tasks. Our setup includes 28 source datasets, 11 target datasets, 5 model architectures, and 2 pre-training methods. We benchmark our method against current state-of-the-art metrics MS-LEEP and E-LEEP, and outperform them consistently using the proposed approach.

artificial intelligence, machine learning, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2309.02429

Country:

North America > United States > California (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Colorado (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)

Add feedback

Towards Estimating Transferability using Hard Subsets

Menta, Tarun Ram, Jandial, Surgan, Patil, Akash, KB, Vimal, Bachu, Saketh, Krishnamurthy, Balaji, Balasubramanian, Vineeth N., Agarwal, Chirag, Sarkar, Mausoom

arXiv.org Artificial IntelligenceJan-17-2023

As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine-tuning. By leveraging the model's internal and output representations, we introduce two techniques - one class-agnostic and another class-specific - to identify harder subsets and show that H Transfer learning (Pan & Yang, 2009; Torrey & Shavlik, 2010; Weiss et al., 2016) aims to improve the performance of models on target tasks by utilizing the knowledge from source tasks. With the increasing development of large-scale pre-trained models (Devlin et al., 2019; Chen et al., 2020a;b; Radford et al., 2021b), and the availability of multiple model choices (e.g model hubs of Pytorch, Tensorflow, Hugging Face) for transfer learning, it is critical to estimate their transferability without training on the target task and determine how effectively transfer learning algorithms will transfer knowledge from the source to the target task. To this end, transferability estimation metrics (Zamir et al., 2018b; Achille et al., 2019; Tran et al., 2019b; Pándy et al., 2022; Nguyen et al., 2020) have been recently proposed to quantify how easy it is to use the knowledge learned from these models with minimal to no additional training using the target dataset. Given multiple pre-trained source models and target datasets, estimating transferability is essential because it is non-trivial to determine which source model transfers best to a target dataset, and that training multiple models using all source-target combinations can be computationally expensive. Recent years have seen a few different approaches (Zamir et al., 2018b; Achille et al., 2019; Tran et al., 2019b; Pándy et al., 2022; Nguyen et al., 2020) for estimating a given transfer learning task from a source model. However, existing such methods often require performing the transfer learning task for parameter optimization (Achille et al., 2019; Zamir et al., 2018b) or making strong assumptions on the source and target datasets (Tran et al., 2019b; Zamir et al., 2018b).

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.06928

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback