AITopics | Ghoshdastidar, Debarghya

Collaborating Authors

Ghoshdastidar, Debarghya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generalization Certificates for Adversarially Robust Bayesian Linear Regression

Sabanayagam, Mahalakshmi, Tsuchida, Russell, Ong, Cheng Soon, Ghoshdastidar, Debarghya

arXiv.org Machine LearningFeb-20-2025

Adversarial robustness of machine learning models is critical to ensuring reliable performance under data perturbations. Recent progress has been on point estimators, and this paper considers distributional predictors. First, using the link between exponential families and Bregman divergences, we formulate an adversarial Bregman divergence loss as an adversarial negative log-likelihood. Using the geometric properties of Bregman divergences, we compute the adversarial perturbation for such models in closed-form. Second, under such losses, we introduce \emph{adversarially robust posteriors}, by exploiting the optimization-centric view of generalized Bayesian inference. Third, we derive the \emph{first} rigorous generalization certificates in the context of an adversarial extension of Bayesian linear regression by leveraging the PAC-Bayesian framework. Finally, experiments on real and synthetic datasets demonstrate the superior robustness of the derived adversarially robust posterior over Bayes posterior, and also validate our theoretical guarantees.

artificial intelligence, machine learning, posterior, (17 more...)

arXiv.org Machine Learning

2502.14298

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Attention Learning is Needed to Efficiently Learn Parity Function

Han, Yaomengxi, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceFeb-11-2025

Transformers, with their attention mechanisms, have emerged as the state-of-the-art architectures of sequential modeling and empirically outperform feed-forward neural networks (FFNNs) across many fields, such as natural language processing and computer vision. However, their generalization ability, particularly for low-sensitivity functions, remains less studied. We bridge this gap by analyzing transformers on the $k$-parity problem. Daniely and Malach (NeurIPS 2020) show that FFNNs with one hidden layer and $O(nk^7 \log k)$ parameters can learn $k$-parity, where the input length $n$ is typically much larger than $k$. In this paper, we prove that FFNNs require at least $\Omega(n)$ parameters to learn $k$-parity, while transformers require only $O(k)$ parameters, surpassing the theoretical lower bound needed by FFNNs. We further prove that this parameter efficiency cannot be achieved with fixed attention heads. Our work establishes transformers as theoretically superior to FFNNs in learning parity function, showing how their attention mechanisms enable parameter-efficient generalization in functions with low sensitivity.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.07553

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Recovering Imbalanced Clusters via Gradient-Based Projection Pursuit

Eppert, Martin, Mukherjee, Satyaki, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceFeb-4-2025

Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli-Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm's sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method's applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.02668

Country:

North America > United States (0.28)
Europe > Germany (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

A Probabilistic Model for Self-Supervised Learning

Fleissner, Maximilian, Esser, Pascal, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceJan-22-2025

Self-supervised learning (SSL) aims to find meaningful representations from unlabeled data by encoding semantic similarities through data augmentations. Despite its current popularity, theoretical insights about SSL are still scarce. For example, it is not yet known whether commonly used SSL loss functions can be related to a statistical model, much in the same as OLS, generalized linear models or PCA naturally emerge as maximum likelihood estimates of an underlying generative process. In this short paper, we consider a latent variable statistical model for SSL that exhibits an interesting property: Depending on the informativeness of the data augmentations, the MLE of the model either reduces to PCA, or approaches a simple non-contrastive loss. We analyze the model and also empirically illustrate our findings.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.13031

Country: Europe > Germany (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Data Augmentations Go Beyond Encoding Invariances: A Theoretical Study on Self-Supervised Learning

Feigin, Shlomo Libo, Fleissner, Maximilian, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceDec-15-2024

Understanding the role of data augmentations is critical for applying Self-Supervised Learning (SSL) methods in new domains. Data augmentations are commonly understood as encoding invariances into the learned representations. This interpretation suggests that SSL would require diverse augmentations that resemble the original data. However, in practice, augmentations do not need to be similar to the original data nor be diverse, and can be neither at the same time. We provide a theoretical insight into this phenomenon. We show that for different SSL losses, any non-redundant representation can be learned with a single suitable augmentation. We provide an algorithm to reconstruct such augmentations and give insights into augmentation choices in SSL.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.01767

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)

Add feedback

Tight PAC-Bayesian Risk Certificates for Contrastive Learning

Van Elst, Anna, Ghoshdastidar, Debarghya

arXiv.org Machine LearningDec-5-2024

A key driving force behind the rapid advances in foundation models is the availability and exploitation of massive amounts of unlabeled data. Broadly, one learns meaningful representations from unlabeled data, reducing the demand for labeled samples when training (downstream) predictive models. In recent years, there has been a strong focus on self-supervised approaches to representation learning, which learn neural network-based embedding maps from carefully constructed augmentations of unlabeled data, such as image cropping, rotations, color distortion, Gaussian blur, etc. [3, 7, 9, 15]. Contrastive representation learning is a popular form of self-supervised learning where one aims to learn a mapping of the data to a Euclidean space such that semantically similar data, obtained via augmentations, are embedded closer than independent samples [45, 19, 24]. This technique gained widespread attention with the introduction of SimCLR, an abbreviation for simple framework for contrastive learning of representations [7]. The SimCLR framework employs a carefully designed contrastive loss to maximize the similarity between the representations of augmented views of the same sample while minimizing the similarity between the representations from different samples [39, 7]. Although SimCLR remains one of the most practically used contrastive models, theoretical analysis of SimCLR's performance and generalization abilities is still limited [4, 32]. The study of generalization error in self-supervised models is mostly based on two distinct frameworks [2, 17], both introduced in the context of contrastive learning. The contrastive unsupervised representation learning (CURL) framework, introduced by Arora et al. [2], assumes access to tuples z

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2412.03486

Country:

North America > United States > Hawaii (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exact Certification of (Graph) Neural Networks Against Label Poisoning

Sabanayagam, Mahalakshmi, Gosch, Lukas, Günnemann, Stephan, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceNov-30-2024

Machine learning models are highly vulnerable to label flipping, i.e., the adversarial modification (poisoning) of training labels to compromise performance. Thus, deriving robustness certificates is important to guarantee that test predictions remain unaffected and to understand worst-case robustness behavior. However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. Our method leverages the Neural Tangent Kernel (NTK) to capture the training dynamics of wide networks enabling us to reformulate the bilevel optimization problem representing label flipping into a Mixed-Integer Linear Program (MILP). We apply our method to certify a broad range of GNN architectures in node classification tasks. Thereby, concerning the worst-case robustness to label flipping: $(i)$ we establish hierarchies of GNNs on different benchmark graphs; $(ii)$ quantify the effect of architectural choices such as activations, depth and skip-connections; and surprisingly, $(iii)$ uncover a novel phenomenon of the robustness plateauing for intermediate perturbation budgets across all investigated datasets and architectures. While we focus on GNNs, our certificates are applicable to sufficiently wide NNs in general through their NTK. Thus, our work presents the first exact certificate to a poisoning attack ever derived for neural networks, which could be of independent interest.

artificial intelligence, certificate, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.00537

Country:

Europe > Germany (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Government (0.67)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Infinite Width Limits of Self Supervised Neural Networks

Fleissner, Maximilian, Anil, Gautham Govind, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceNov-17-2024

The NTK is a widely used tool in the theoretical analysis of deep learning, allowing us to look at supervised deep neural networks through the lenses of kernel regression. Recently, several works have investigated kernel models for self-supervised learning, hypothesizing that these also shed light on the behaviour of wide neural networks by virtue of the NTK. However, it remains an open question to what extent this connection is mathematically sound -- it is a commonly encountered misbelief that the kernel behaviour of wide neural networks emerges irrespective of the loss function it is trained on. In this paper, we bridge the gap between the NTK and self-supervised learning, focusing on two-layer neural networks trained under the Barlow Twins loss. We prove that the NTK of Barlow Twins indeed becomes constant as the width of the network approaches infinity. Our analysis technique is different from previous works on the NTK and may be of independent interest. Overall, our work provides a first rigorous justification for the use of classic kernel theory to understand self-supervised learning of wide neural networks. Building on this result, we derive generalization error bounds for kernelized Barlow Twins and connect them to neural networks of finite width.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2411.11176

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations

Fleissner, Maximilian, Zarvandi, Maedeh, Ghoshdastidar, Debarghya

arXiv.org Artificial IntelligenceNov-3-2024

Decision Trees are one of the backbones of explainable machine learning, and often serve as interpretable alternatives to black-box models. Traditionally utilized in the supervised setting, there has recently also been a surge of interest in decision trees for unsupervised learning. While several works with worst-case guarantees on the clustering cost have appeared, these results are distribution-agnostic, and do not give insight into when decision trees can actually recover the underlying distribution of the data (up to some small error). In this paper, we therefore introduce the notion of an explainability-to-noise ratio for mixture models, formalizing the intuition that well-clustered data can indeed be explained well using a decision tree. We propose an algorithm that takes as input a mixture model and constructs a suitable tree in data-independent time. Assuming sub-Gaussianity of the mixture components, we prove upper and lower bounds on the error rate of the resulting decision tree. In addition, we demonstrate how concept activation vectors can be used to extend explainable clustering to neural networks. We empirically demonstrate the efficacy of our approach on standard tabular and image datasets.

artificial intelligence, decision tree learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.01576

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks

Gosch, Lukas, Sabanayagam, Mahalakshmi, Ghoshdastidar, Debarghya, Günnemann, Stephan

arXiv.org Artificial IntelligenceJul-15-2024

Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data, as well as backdoor attacks that additionally manipulate the test data. These vulnerabilities have led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning and backdoor attacks targeting the node features of a given graph. Our certificates are white-box and based upon $(i)$ the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and $(ii)$ a novel reformulation of the bilevel optimization problem describing poisoning as a mixed-integer linear program. Consequently, we leverage our framework to provide fundamental insights into the role of graph structure and its connectivity on the worst-case robustness behavior of convolution-based and PageRank-based GNNs. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.

artificial intelligence, machine learning, robustness, (15 more...)

arXiv.org Artificial Intelligence

2407.10867

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback