Goto

Collaborating Authors

 Liguria




Toward Scalable and Valid Conditional Independence Testing with Spectral Representations

Frohlich, Alek, Kostic, Vladimir, Lounici, Karim, Perazzo, Daniel, Pontil, Massimiliano

arXiv.org Machine Learning

Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity on real-world data. Kernel methods using the partial covariance operator offer a more principled approach but suffer from limited adaptivity, slow convergence, and poor scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of the partial covariance operator and use them to construct a simple test statistic, reminiscent of the Hilbert-Schmidt Independence Criterion (HSIC). We also introduce a practical bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.


Consensus dimension reduction via multi-view learning

An, Bingxue, Tang, Tiffany M.

arXiv.org Machine Learning

Dimension reduction methods are a fundamental class of techniques in data analysis, which aim to find a lower-dimensional representation of higher-dimensional data while preserving as much of the original information as possible. These methods are extensively used in practice, including in exploratory data analyses to visualize data--arguably, one of the first and most vital steps in any data analysis (Ray et al., 2021). Notably, in genomics, dimension reduction methods are ubiquitously applied to visualize high-dimensional single-cell RNA sequencing data in two dimensions (Becht et al., 2019). Beyond visualization, dimension reduction methods are also frequently employed to mitigate the curse of dimensionality (Bellman, 1957), engineer new features to improve downstream tasks like prediction (e.g., Massy, 1965), and enable scientific discovery in unsupervised learning settings (Chang et al., 2025). For example, many researchers have used dimension reduction in conjunction with clustering to discover new cell types and cell states (Wu et al., 2021), new cancer subtypes (Northcott et al., 2017), and other substantively-meaningful structure in a variety of domains (Bergen et al., 2019; Traven et al., 2017). Given the widespread use and need for dimension reduction methods, numerous dimension reduction techniques have been developed. Popular techniques include but are not limited to principal component analysis (PCA) (Pearson, 1901; Hotelling, 1933), multidimensional scaling (MDS) (Torgerson, 1952; Kruskal, 1964a), Isomap (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis and Saul, 2000), t-distributed stochastic neighbor embedding (t-SNE) (van der 1


Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition

Sarcinelli, João Lucas Luz Lima, Silva, Diego Furtado

arXiv.org Artificial Intelligence

Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower-resource languages like Portuguese. While open-weight LLMs enable local deployment, no single model dominates all tasks, motivating ensemble approaches. However, existing LLM ensembles focus on text generation or classification, leaving NER under-explored. In this context, this work proposes a novel three-step ensemble pipeline for zero-shot NER using similarly capable, locally run LLMs. Our method outperforms individual LLMs in four out of five Portuguese NER datasets by leveraging a heuristic to select optimal model combinations with minimal annotated data. Moreover, we show that ensembles obtained on different source datasets generally outperform individual LLMs in cross-dataset configurations, potentially eliminating the need for annotated data for the current task.


DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors

Barmina, Gianluca, Norman, Nathalie Carmen Hau, Schneider-Kamp, Peter, Poech, Lukas Galke

arXiv.org Artificial Intelligence

We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a set of fourteen corruption functions that generate incorrect sentences by systematically introducing errors into existing correct Danish sentences. To ensure the accuracy of these corruptions, we assess their validity using both manual and automatic methods. The results are then used as a benchmark for evaluating Large Language Models on a linguistic acceptability judgement task. Our findings demonstrate that this extension is both broader and more comprehensive than the current state of the art. By incorporating a greater variety of corruption types, our benchmark provides a more rigorous assessment of linguistic acceptability, increasing task difficulty, as evidenced by the lower performance of LLMs on our benchmark compared to existing ones. Our results also suggest that our benchmark has a higher discriminatory power which allows to better distinguish well-performing models from low-performing ones.


Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure Matters

Chehade, Adel, Ragusa, Edoardo, Gastaldo, Paolo, Zunino, Rodolfo

arXiv.org Artificial Intelligence

Traffic classification (TC) plays a critical role in cybersecurity, particularly in IoT and embedded contexts, where inspection must often occur locally under tight hardware constraints. We use hardware-aware neural architecture search (HW-NAS) to derive lightweight TC models that are accurate, efficient, and deployable on edge platforms. Two input formats are considered: a flattened byte sequence and a 2D packet-wise time series; we examine how input structure affects adversarial vulnerability when using resource-constrained models. Robustness is assessed against white-box attacks, specifically Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). On USTC-TFC2016, both HW-NAS models achieve over 99% clean-data accuracy while remaining within 65k parameters and 2M FLOPs. Yet under perturbations of strength 0.1, their robustness diverges: the flat model retains over 85% accuracy, while the time-series variant drops below 35%. Adversarial fine-tuning delivers robust gains, with flat-input accuracy exceeding 96% and the time-series variant recovering over 60 percentage points in robustness, all without compromising efficiency. The results underscore how input structure influences adversarial vulnerability, and show that even compact, resource-efficient models can attain strong robustness, supporting their practical deployment in secure edge-based TC.


Developing a Comprehensive Framework for Sentiment Analysis in Turkish

Aydin, Cem Rifki

arXiv.org Artificial Intelligence

In this thesis, we developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish. We have also proposed several approaches specific to sentiment analysis in English only. We have accordingly made five major and three minor contributions. We generated a novel and effective feature set by combining unsupervised, semi-supervised, and supervised metrics. We then fed them as input into classical machine learning methods, and outperformed neural network models for datasets of different genres in both Turkish and English. We created a polarity lexicon with a semi-supervised domain-specific method, which has been the first approach applied for corpora in Turkish. We performed a fine morphological analysis for the sentiment classification task in Turkish by determining the polarities of morphemes. This can be adapted to other morphologically-rich or agglutinative languages as well. We have built a novel neural network architecture, which combines recurrent and recursive neural network models for English. We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English. We also redefined context windows as subclauses in modelling word representations in English. This can also be applied to other linguistic fields and natural language processing tasks. We have achieved state-of-the-art and significant results for all these original approaches. Our minor contributions include methods related to aspect-based sentiment in Turkish, parameter redefinition in the semi-supervised approach, and aspect term extraction techniques for English. This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020. Our work has also contributed to the opinion classification problem in English.


Robot-mediated physical Human-Human Interaction in Neurorehabilitation: a position paper

Vianello, Lorenzo, Short, Matthew, Manczurowsky, Julia, Küçüktabak, Emek Barış, Di Tommaso, Francesco, Noccaro, Alessia, Bandini, Laura, Clark, Shoshana, Fiorenza, Alaina, Lunardini, Francesca, Canton, Alberto, Gandolla, Marta, Pedrocchi, Alessandra L. G., Ambrosini, Emilia, Murie-Fernandez, Manuel, Roman, Carmen B., Tornero, Jesus, Leon, Natacha, Sawers, Andrew, Patton, Jim, Formica, Domenico, Tagliamonte, Nevio Luigi, Rauter, Georg, Baur, Kilian, Just, Fabian, Hasson, Christopher J., Novak, Vesna D., Pons, Jose L.

arXiv.org Artificial Intelligence

Neurorehabilitation conventionally relies on the interaction between a patient and a physical therapist. Robotic systems can improve and enrich the physical feedback provided to patients after neurological injury, but they under-utilize the adaptability and clinical expertise of trained therapists. In this position paper, we advocate for a novel approach that integrates the therapist's clinical expertise and nuanced decision-making with the strength, accuracy, and repeatability of robotics: Robot-mediated physical Human-Human Interaction. This framework, which enables two individuals to physically interact through robotic devices, has been studied across diverse research groups and has recently emerged as a promising link between conventional manual therapy and rehabilitation robotics, harmonizing the strengths of both approaches. This paper presents the rationale of a multidisciplinary team-including engineers, doctors, and physical therapists-for conducting research that utilizes: a unified taxonomy to describe robot-mediated rehabilitation, a framework of interaction based on social psychology, and a technological approach that makes robotic systems seamless facilitators of natural human-human interaction.


RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

Li, Mengfan, Shi, Xuanhua, Deng, Yang

arXiv.org Artificial Intelligence

Large Language models are revolutionizing the conversational recommender systems through their impressive capabilities in instruction comprehension, reasoning, and human interaction. A core factor underlying effective recommendation dialogue is the ability to infer and reason about users' mental states (such as desire, intention, and belief), a cognitive capacity commonly referred to as Theory of Mind. Despite growing interest in evaluating ToM in LLMs, current benchmarks predominantly rely on synthetic narratives inspired by Sally-Anne test, which emphasize physical perception and fail to capture the complexity of mental state inference in realistic conversational settings. Moreover, existing benchmarks often overlook a critical component of human ToM: behavioral prediction, the ability to use inferred mental states to guide strategic decision-making and select appropriate conversational actions for future interactions. To better align LLM-based ToM evaluation with human-like social reasoning, we propose RecToM, a novel benchmark for evaluating ToM abilities in recommendation dialogues. RecToM focuses on two complementary dimensions: Cognitive Inference and Behavioral Prediction. The former focus on understanding what has been communicated by inferring the underlying mental states. The latter emphasizes what should be done next, evaluating whether LLMs can leverage these inferred mental states to predict, select, and assess appropriate dialogue strategies. Extensive experiments on state-of-the-art LLMs demonstrate that RecToM poses a significant challenge. While the models exhibit partial competence in recognizing mental states, they struggle to maintain coherent, strategic ToM reasoning throughout dynamic recommendation dialogues, particularly in tracking evolving intentions and aligning conversational strategies with inferred mental states.