AITopics

2503.00136

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)

arXiv.org Artificial IntelligenceJan-29-2025

Disentangling Safe and Unsafe Corruptions via Anisotropy and Locality

Muthukumar, Ramchandran, Pal, Ambar, Sulam, Jeremias, Vidal, Rene

State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the $\ell_p$ norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called \texttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with \textit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes \textit{safe} perturbations of large $\ell_p$ norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding \textit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.

artificial intelligence, machine learning, natural language, (21 more...)

2501.18098

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

arXiv.org Machine LearningOct-15-2024

Sufficient and Necessary Explanations (and What Lies in Between)

Bharti, Beepul, Yi, Paul, Sulam, Jeremias

As complex machine learning models continue to find applications in high-stakes decision-making scenarios, it is crucial that we can explain and understand their predictions. Post-hoc explanation methods provide useful insights by identifying important features in an input $\mathbf{x}$ with respect to the model output $f(\mathbf{x})$. In this work, we formalize and study two precise notions of feature importance for general machine learning models: sufficiency and necessity. We demonstrate how these two types of explanations, albeit intuitive and simple, can fall short in providing a complete picture of which features a model finds important. To this end, we propose a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis. Our unified notion, we show, has strong ties to other popular definitions of feature importance, like those based on conditional independence and game-theoretic quantities like Shapley values. Crucially, we demonstrate how a unified perspective allows us to detect important features that could be missed by either of the previous approaches alone.

machine learning, natural language, sufficiency and necessity, (13 more...)

2409.20427

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Machine LearningJun-23-2024

Pivotal Auto-Encoder via Self-Normalizing ReLU

Goldenstein, Nelson, Sulam, Jeremias, Romano, Yaniv

Sparse auto-encoders are useful for extracting low-dimensional representations from high-dimensional data. However, their performance degrades sharply when the input noise at test time differs from the noise employed during training. This limitation hinders the applicability of auto-encoders in real-world scenarios where the level of noise in the input is unpredictable. In this paper, we formalize single hidden layer sparse auto-encoders as a transform learning problem. Leveraging the transform modeling interpretation, we propose an optimization problem that leads to a predictive model invariant to the noise level at test time. In other words, the same pre-trained model is able to generalize to different noise levels. The proposed optimization algorithm, derived from the square root lasso, is translated into a new, computationally efficient auto-encoding architecture. After proving that our new method is invariant to the noise level, we evaluate our approach by training networks using the proposed architecture for denoising tasks. Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise compared to commonly used architectures.

artificial intelligence, machine learning, noise level, (16 more...)

doi: 10.1109/TSP.2024.3418971

2406.16052

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningMay-29-2024

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

Teneggi, Jacopo, Sulam, Jeremias

Recent works have extended notions of feature importance to \emph{semantic concepts} that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the global (i.e., over a population) and local (i.e., for a sample) statistical importance of semantic concepts for the predictions of opaque models, by means of conditional independence, which allows for rigorous testing. We use recent ideas of sequential kernelized testing (SKIT) to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework on synthetic datasets as well as on image classification tasks using vision-language models such as CLIP.

machine learning, natural language, prediction, (17 more...)

2405.19146

Country:

Europe > United Kingdom > England (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Health & Medicine (0.92)
Media > Radio (0.45)
Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)

arXiv.org Artificial IntelligenceMay-23-2024

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Pal, Ambar, Vidal, René, Sulam, Jeremias

Recent work in adversarial robustness suggests that natural data distributions are localized, i.e., they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to $\ell_0$-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of $\ell_0$-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.

artificial intelligence, classifier, machine learning, (17 more...)

2405.14176

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)

arXiv.org Artificial IntelligenceDec-27-2023

SHAP-XRT: The Shapley Value Meets Conditional Independence Testing

Teneggi, Jacopo, Bharti, Beepul, Romano, Yaniv, Sulam, Jeremias

The complex nature of artificial neural networks raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. The Shapley value -- a solution concept from game theory -- is one of the most popular explanation methods for machine learning models. More traditionally, from a statistical perspective, feature importance is defined in terms of conditional independence. So far, these two approaches to interpretability and feature importance have been considered separate and distinct. In this work, we show that Shapley-based explanation methods and conditional independence testing are closely related. We introduce the SHAPley EXplanation Randomization Test (SHAP-XRT), a testing procedure inspired by the Conditional Randomization Test (CRT) for a specific notion of local (i.e., on a sample) conditional independence. With it, we prove that for binary classification problems, the marginal contributions in the Shapley value provide lower and upper bounds to the expected $p$-values of their respective tests. Furthermore, we show that the Shapley value itself provides an upper bound to the expected $p$-value of a global (i.e., overall) null hypothesis. As a result, we further our understanding of Shapley-based explanation methods from a novel perspective and characterize the conditions under which one can make statistically valid claims about feature importance via the Shapley value.

artificial intelligence, machine learning, shapley value, (15 more...)

2207.07038

Country:

North America > United States (0.14)
Asia > Middle East (0.14)
Europe (0.14)

Genre: Research Report > Experimental Study (0.31)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.67)
Information Technology > Security & Privacy (0.67)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceOct-22-2023

What's in a Prior? Learned Proximal Networks for Inverse Problems

Fang, Zhenghan, Buchanan, Sam, Sulam, Jeremias

Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed. Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling, where they loosely resemble proximal operators. Yet, something essential is lost in employing these purely data-driven approaches: there is no guarantee that a general deep network represents the proximal operator of any function, nor is there any characterization of the function for which the network might provide some approximate proximal. This not only makes guaranteeing convergence of iterative schemes challenging but, more fundamentally, complicates the analysis of what has been learned by these networks about their training data. Herein we provide a framework to develop learned proximal networks (LPN), prove that they provide exact proximal operators for a data-driven nonconvex regularizer, and show how a new training strategy, dubbed proximal matching, provably promotes the recovery of the log-prior of the true data distribution. Such LPN provide general, unsupervised, expressive proximal operators that can be used for general inverse problems with convergence guarantees. We illustrate our results in a series of cases of increasing complexity, demonstrating that these models not only result in state-of-the-art performance, but provide a window into the resulting priors learned from data.

artificial intelligence, machine learning, proximal operator, (15 more...)

2310.14344

Country:

Europe (0.28)
South America > Peru (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.65)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-27-2023

Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness

Pal, Ambar, Sulam, Jeremias, Vidal, René

The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.

adversarial robustness, artificial intelligence, machine learning, (3 more...)

2309.16096

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

arXiv.org Artificial IntelligenceJul-4-2023

Sparsity-aware generalization theory for deep neural networks

Muthukumar, Ramchandran, Sulam, Jeremias

Statistical learning theory seeks to characterize the generalization ability of machine learning models, obtained from finite training data, to unseen test data. The field is by now relatively mature, and several tools exist to provide upper bounds on the generalization error, R(h). Often the upper bounds depend on the empirical risk, ˆR(h), and different characterizations of complexity of the hypothesis class as well as potentially specific data-dependent properties. The renewed interest in deep artificial neural network models has demonstrated important limitations of existing tools. For example, VC dimension often simply relates to the number of model parameters and is hence insufficient to explain generalization of overparameterized models (Bartlett et al., 2019). Traditional measures based on Rademacher complexity are also often vacuous, as these networks can indeed be trained to fit random noise (Zhang et al., 2017). Margin bounds have been adapted to deep non-linear networks (Bartlett et al., 2017; Golowich et al., 2018; Neyshabur et al., 2015, 2018), albeit still unable to provide practically informative results. An increasing number of studies advocate for non-uniform data-dependent measures to explain generalization in deep learning (Nagarajan and Kolter, 2019a; Pérez and Louis, 2020; Wei and Ma, 2019).

artificial intelligence, generalization, machine learning, (19 more...)

2307.00426

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)