AITopics | Vasudeva, Bhavya

Collaborating Authors

Vasudeva, Bhavya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Simplicity Bias of Transformers to Learn Low Sensitivity Functions

Vasudeva, Bhavya, Fu, Deqing, Zhou, Tianyi, Kau, Elliott, Huang, Youqi, Sharan, Vatsal

arXiv.org Machine LearningMar-11-2024

Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive. Various neural network architectures such as fully connected networks have been found to have a simplicity bias towards simple functions of the data; one version of this simplicity bias is a spectral bias to learn simple functions in the Fourier space. In this work, we identify the notion of sensitivity of the model to random changes in the input as a notion of simplicity bias which provides a unified metric to explain the simplicity and spectral bias of transformers across different data modalities. We show that transformers have lower sensitivity than alternative architectures, such as LSTMs, MLPs and CNNs, across both vision and language tasks. We also show that low-sensitivity bias correlates with improved robustness; furthermore, it can also be used as an efficient intervention to further improve the robustness of transformers.

artificial intelligence, machine learning, sensitivity, (14 more...)

arXiv.org Machine Learning

2403.06925

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Implicit Bias and Fast Convergence Rates for Self-attention

Vasudeva, Bhavya, Deora, Puneesh, Thrampoulidis, Christos

arXiv.org Artificial IntelligenceFeb-8-2024

Self-attention serves as the fundamental building block of transformers, distinguishing them from traditional neural networks (Vaswani et al., 2017) and driving their outstanding performance across various applications, including natural language processing and generation (Devlin et al., 2019; Brown et al., 2020; Raffel et al., 2020), as well as computer vision (Dosovitskiy et al., 2021; Radford et al., 2021; Touvron et al., 2021). With transformers establishing themselves as the de-facto deep-learning architecture, driving advancements in applications seamlessly integrated into society's daily life at an unprecedented pace (OpenAI, 2022), there has been a surge of recent interest in the mathematical study of the fundamental optimization and statistical principles of the self-attention mechanism; see Section 6 on related work for an overview. In pursuit of this objective, Tarzanagh et al. (2023b,a) have initiated an investigation into the implicit bias of gradient descent (GD) in training a self-attention layer with fixed linear decoder in a binary classification task. Concretely, the study paradigm of implicit bias seeks to characterize structural properties of the weights learned by GD when the training objective has multiple solutions. The prototypical instance of this paradigm is GD training of linear logistic regression on separable data: among infinitely many possible solutions to logistic-loss minimization (each linear separator defines one such solution), GD learns weights that converge in direction to the (unique) max-margin class separator (Soudry et al., 2018; Ji and Telgarsky, 2018). Notably, convergence is global, holding irrespective of the initial weights' direction, and comes with explicit rates that characterize its speed with respect to the number of iterations. Drawing an analogy to this prototypical instance, when training self-attention with linear decoder in a binary classification task, Tarzanagh et al. (2023a) defines a hard-margin SVM problem (W-SVM) that separates, with maximal margin, optimal input tokens from non-optimal ones based on their respective softmax logits.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.05738

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness

Vasudeva, Bhavya, Shahabi, Kameron, Sharan, Vatsal

arXiv.org Machine LearningOct-9-2023

Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased predictions which have poor out-of-distribution (OOD) generalization. To address this, we propose a framework that encourages the model to use a more diverse set of features to make predictions. We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model. We demonstrate the effectiveness of this framework in various problem settings and real-world applications, showing that it effectively addresses simplicity bias and leads to more features being used, enhances OOD generalization, and improves subgroup robustness and fairness. We complement these results with theoretical analyses of the effect of the regularization and its OOD generalization properties.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2310.06161

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback