Goto

Collaborating Authors

 Piantanida, Pablo


Statistical Deficiency for Task Inclusion Estimation

arXiv.org Artificial Intelligence

While we theoretically show for which annotated datasets exist, and it is commonly the shortcomings of naively measuring cross-task accepted that the summarization task, at performance by directly applying each model to least in the news domain, requires NER skills to each other task, the contributions of the paper are be performed effectively. As a consequence, studying threefold: generated summaries from the perspective of A theoretical framework for task definition retained named entities is a relevant evaluation and inclusion. Based on information concepts angle (Pagnoni et al., 2021; Berezin and Batura, and theory, we propose a clear definition of a task 2022; Akani et al., 2023). According to this principle, and candidate notions of inclusion (independent a more general hypothesis is that multi-task of the notion of model).


Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

arXiv.org Machine Learning

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to the original models. In this work, we investigate the impact of quantization procedures on the privacy of data-driven models, specifically focusing on their vulnerability to membership inference attacks. We derive an asymptotic theoretical analysis of Membership Inference Security (MIS), characterizing the privacy implications of quantized algorithm weights against the most powerful (and possibly unknown) attacks. Building on these theoretical insights, we propose a novel methodology to empirically assess and rank the privacy levels of various quantization procedures. Using synthetic datasets, we demonstrate the effectiveness of our approach in assessing the MIS of different quantizers. Furthermore, we explore the trade-off between privacy and performance using real-world data and models in the context of molecular modeling.


BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

arXiv.org Artificial Intelligence

The emergence of large pre-trained vision-language models (VLMs) represents a paradigm shift in machine learning, with unprecedented results in a broad span of visual recognition tasks. CLIP, one of the most popular VLMs, has exhibited remarkable zero-shot and transfer learning capabilities in classification. To transfer CLIP to downstream tasks, adapters constitute a parameter-efficient approach that avoids backpropagation through the large model (unlike related prompt learning methods). However, CLIP adapters have been developed to target discriminative performance, and the quality of their uncertainty estimates has been overlooked. In this work we show that the discriminative performance of state-of-the-art CLIP adapters does not always correlate with their uncertainty estimation capabilities, which are essential for a safe deployment in real-world scenarios. We also demonstrate that one of such adapters is obtained through MAP inference from a more general probabilistic framework. Based on this observation we introduce BayesAdapter, which leverages Bayesian inference to estimate a full probability distribution instead of a single point, better capturing the variability inherent in the parameter space. In a comprehensive empirical evaluation we show that our approach obtains high quality uncertainty estimates in the predictions, standing out in calibration and selective classification. Our code will be publicly available upon acceptance of the paper.


Combine and Conquer: A Meta-Analysis on Data Shift and Out-of-Distribution Detection

arXiv.org Machine Learning

This paper introduces a universal approach to seamlessly combine out-of-distribution (OOD) detection scores. These scores encompass a wide range of techniques that leverage the self-confidence of deep learning models and the anomalous behavior of features in the latent space. Not surprisingly, combining such a varied population using simple statistics proves inadequate. To overcome this challenge, we propose a quantile normalization to map these scores into p-values, effectively framing the problem into a multi-variate hypothesis test. Then, we combine these tests using established meta-analysis tools, resulting in a more effective detector with consolidated decision boundaries. Furthermore, we create a probabilistic interpretable criterion by mapping the final statistics into a distribution with known parameters. Through empirical investigation, we explore different types of shifts, each exerting varying degrees of impact on data. Our results demonstrate that our approach significantly improves overall robustness and performance across diverse OOD detection scenarios. Notably, our framework is easily extensible for future developments in detection scores and stands as the first to combine decision boundaries in this context. The code and artifacts associated with this work are publicly available\footnote{\url{https://github.com/edadaltocg/detectors}}.


Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE

arXiv.org Artificial Intelligence

Machine learning models can solve complex tasks but often require significant computational resources during inference. This has led to the development of various post-training computation reduction methods that tackle this issue in different ways, such as quantization which reduces the precision of weights and arithmetic operations, and dynamic networks which adapt computation to the sample at hand. In this work, we propose a more general dynamic network that can combine both quantization and early exit dynamic network: QuEE. Our algorithm can be seen as a form of soft early exiting or input-dependent compression. Rather than a binary decision between exiting or continuing, we introduce the possibility of continuing with reduced computation. This complicates the traditionally considered early exiting problem, which we solve through a principled formulation. The crucial factor of our approach is accurate prediction of the potential accuracy improvement achievable through further computation. We demonstrate the effectiveness of our method through empirical evaluation, as well as exploring the conditions for its success on 4 classification datasets.


GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

arXiv.org Artificial Intelligence

Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to conferences has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.


Beyond the Norms: Detecting Prediction Errors in Regression Models

arXiv.org Artificial Intelligence

This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems. Our code is available at https://zenodo.org/records/11281964.


Is Meta-training Really Necessary for Molecular Few-Shot Learning ?

arXiv.org Artificial Intelligence

Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving convoluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.


$\texttt{COSMIC}$: Mutual Information for Task-Agnostic Summarization Evaluation

arXiv.org Artificial Intelligence

Assessing the quality of summarizers poses significant challenges. In response, we propose a novel task-oriented evaluation approach that assesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes. We theoretically establish a direct relationship between the resulting error probability of these tasks and the mutual information between source texts and generated summaries. We introduce $\texttt{COSMIC}$ as a practical implementation of this metric, demonstrating its strong correlation with human judgment-based metrics and its effectiveness in predicting downstream task performance. Comparative analyses against established metrics like $\texttt{BERTScore}$ and $\texttt{ROUGE}$ highlight the competitive performance of $\texttt{COSMIC}$.


Optimal Zero-Shot Detector for Multi-Armed Attacks

arXiv.org Artificial Intelligence

Defending signal communication from attackers is a fundamental problem in information theory (Karlof This paper explores a scenario in which a malicious and Wagner, 2003; Perrig et al., 2004). Notably, some actor employs a multi-armed attack attacks are aimed at the physical layer of the communication strategy to manipulate data samples, offering channel, which is responsible for transmitting them various avenues to introduce noise the signal. The goal of such attacks is to generate into the dataset. Our central objective is a denial of service (DoS), which involves disrupting to protect the data by detecting any alterations legitimate communication by causing intentional malfunction to the input. We approach this defensive of the communication channel (Grover et al., strategy with utmost caution, operating 2014). In a typical input perturbation scenario, a malicious in an environment where the defender possesses actor is allowed to detect and alter the signal significantly less information compared before it reaches the communication channel (Sadeghi to the attacker. Specifically, the defender is and Larsson, 2019; Tian et al., 2022). The interest in unable to utilize any data samples for training such attacks has been exacerbated by the growing popularity a defense model or verifying the integrity of machine learning (ML) models, which are of the channel. Instead, the defender relies known to be vulnerable to adversarial attacks (Goodfellow exclusively on a set of pre-existing detectors et al., 2014).