Inductive Learning
Beyond Instance Consistency: Investigating View Diversity in Self-supervised Learning
Qin, Huaiyuan, Yang, Muli, Hu, Siyuan, Hu, Peng, Zhang, Yu, Gong, Chen, Zhu, Hongyuan
Self-supervised learning (SSL) conventionally relies on the instance consistency paradigm, assuming that different views of the same image can be treated as positive pairs. However, this assumption breaks down for non-iconic data, where different views may contain distinct objects or semantic information. In this paper, we investigate the effectiveness of SSL when instance consistency is not guaranteed. Through extensive ablation studies, we demonstrate that SSL can still learn meaningful representations even when positive pairs lack strict instance consistency. Furthermore, our analysis further reveals that increasing view diversity, by enforcing zero overlapping or using smaller crop scales, can enhance downstream performance on classification and dense prediction tasks. However, excessive diversity is found to reduce effectiveness, suggesting an optimal range for view diversity. To quantify this, we adopt the Earth Mover's Distance (EMD) as an estimator to measure mutual information between views, finding that moderate EMD values correlate with improved SSL learning, providing insights for future SSL framework design. We validate our findings across a range of settings, highlighting their robustness and applicability on diverse data sources.
Least-Ambiguous Multi-Label Classifier
Hagos, Misgina Tsighe, Lundstrรถm, Claes
Abstract--Multi-label learning often requires identifying all relevant labels for training instances, but collecting full label annotations is costly and labor-intensive. In many datasets, only a single positive label is annotated per training instance, despite the presence of multiple relevant labels. This setting, known as single-positive multi-label learning (SPMLL), presents a significant challenge due to its extreme form of partial supervision. We propose a model-agnostic approach to SPMLL that draws on conformal prediction to produce calibrated set-valued outputs, enabling reliable multi-label predictions at test time. We evaluate our approach on 12 benchmark datasets, demonstrating consistent improvements over existing baselines and practical applicability.
Data-Efficient Psychiatric Disorder Detection via Self-supervised Learning on Frequency-enhanced Brain Networks
Liu, Mujie, Zhu, Mengchu, Dong, Qichao, Dang, Ting, Ma, Jiangang, Ren, Jing, Xia, Feng
Psychiatric disorders involve complex neural activity changes, with functional magnetic resonance imaging (fMRI) data serving as key diagnostic evidence. However, data scarcity and the diverse nature of fMRI information pose significant challenges. While graph-based self-supervised learning (SSL) methods have shown promise in brain network analysis, they primarily focus on time-domain representations, often overlooking the rich information embedded in the frequency domain. To overcome these limitations, we propose Frequency-Enhanced Network (FENet), a novel SSL framework specially designed for fMRI data that integrates time-domain and frequency-domain information to improve psychiatric disorder detection in small-sample datasets. FENet constructs multi-view brain networks based on the inherent properties of fMRI data, explicitly incorporating frequency information into the learning process of representation. Additionally, it employs domain-specific encoders to capture temporal-spectral characteristics, including an efficient frequency-domain encoder that highlights disease-relevant frequency features. Finally, FENet introduces a domain consistency-guided learning objective, which balances the utilization of diverse information and generates frequency-enhanced brain graph representations. Experiments on two real-world medical datasets demonstrate that FENet outperforms state-of-the-art methods while maintaining strong performance in minimal data conditions. Furthermore, we analyze the correlation between various frequency-domain features and psychiatric disorders, emphasizing the critical role of high-frequency information in disorder detection.
SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets
Kaczmarek, Emily, Szeto, Justin, Nichyporuk, Brennan, Arbel, Tal
Alzheimer's disease is a progressive, neurodegenerative disorder that causes memory loss and cognitive decline. While there has been extensive research in applying deep learning models to Alzheimer's prediction tasks, these models remain limited by lack of available labeled data, poor generalization across datasets, and inflexibility to varying numbers of input scans and time intervals between scans. In this study, we adapt three state-of-the-art temporal self-supervised learning (SSL) approaches for 3D brain MRI analysis, and add novel extensions designed to handle variable-length inputs and learn robust spatial features. We aggregate four publicly available datasets comprising 3,161 patients for pre-training, and show the performance of our model across multiple Alzheimer's prediction tasks including diagnosis classification, conversion detection, and future conversion prediction. Importantly, our SSL model implemented with temporal order prediction and contrastive learning outperforms supervised learning on six out of seven downstream tasks. It demonstrates adaptability and generalizability across tasks and number of input images with varying time intervals, highlighting its capacity for robust performance across clinical applications.
Anti-Money Laundering Machine Learning Pipelines; A Technical Analysis on Identifying High-risk Bank Clients with Supervised Learning
Namdar, Khashayar, Wang, Pin-Chien, Raju, Tushar, Zheng, Steven, Li, Fiona, Khan, Safwat Tahmin
Anti - money laundering (AML) actions and measurements are among the priorities of financial institutions, for which machine learning (ML) has shown to have a high potential. In this paper, we propose a comprehensive and systematic approach for developing ML pipelines to identify high - risk bank clients in a dataset curated for Task 1 of the University of Toro nto 2023 - 2024 Institute for Management and Innovation (IMI) Big Data and Artificial Intelligence Competition. The dataset included 195,789 customer IDs, and we employed a 16 - step design and statistical analysis to ensure the final pipeline was robust. We also framed the data in a SQLite database, developed SQL - based feature engineering algorithms, connected our pre - trained model to the database, and made i t inference - ready, and provided explainable artificial intelligence (XAI) modules to derive feature importance. Our pipeline achieved a mean area under the receiver operating characteristic curve (AUROC) of 0.961 with a standard deviation (SD) of 0.005. Th e proposed pipeline achieved second place in the competition. Introduction In the contemporary financial landscape, money laundering represents a formidable challenge, compelling both financial institutions and regulatory bodies to seek innovative solutions. The integration of machine learning (ML) into anti - money laundering (AML) efforts has emerged as a promising avenue to enhance the detection and prevention of illicit financial activities. This paper investigates the technical considerations in employing supervised learning techniques to accurately identify high - risk bank clie nts, a critical component in the battle against money laundering. The utilization of ML for detecting money laundering transactions has shown significant promise. Jullum et al. developed an ML model that outperforms traditional systems by prioritizing transactions for manual investigation, using historic data from Norway ' s largest bank, DNB [1] .
On Synthesis of Timed Regular Expressions
Wang, Ziran, An, Jie, Zhan, Naijun, Zhang, Miaomiao, Zhang, Zhenya
Timed regular expressions serve as a formalism for specifying real-time behaviors of Cyber-Physical Systems. In this paper, we consider the synthesis of timed regular expressions, focusing on generating a timed regular expression consistent with a given set of system behaviors including positive and negative examples, i.e., accepting all positive examples and rejecting all negative examples. We first prove the decidability of the synthesis problem through an exploration of simple timed regular expressions. Subsequently, we propose our method of generating a consistent timed regular expression with minimal length, which unfolds in two steps. The first step is to enumerate and prune candidate parametric timed regular expressions. In the second step, we encode the requirement that a candidate generated by the first step is consistent with the given set into a Satisfiability Modulo Theories (SMT) formula, which is consequently solved to determine a solution to parametric time constraints. Finally, we evaluate our approach on benchmarks, including randomly generated behaviors from target timed models and a case study.
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
Wang, Chung-Chun, Lin, Jhen-Ke, Lu, Hao-Chien, Lin, Hong-Yun, Chen, Berlin
Automated speaking assessment (ASA) on opinion expressions is often hampered by the scarcity of labeled recordings, which restricts prompt diversity and undermines scoring reliability. To address this challenge, we propose a novel training paradigm that leverages a large language models (LLM) to generate diverse responses of a given proficiency level, converts responses into synthesized speech via speaker-aware text-to-speech synthesis, and employs a dynamic importance loss to adaptively reweight training instances based on feature distribution differences between synthesized and real speech. Subsequently, a multimodal large language model integrates aligned textual features with speech signals to predict proficiency scores directly. Experiments conducted on the L TTC dataset show that our approach outperforms methods relying on real data or conventional augmentation, effectively mitigating low-resource constraints and enabling ASA on opinion expressions with cross-modal information. Index T erms: Automated speaking assessment, Opinion Expression, Data Augmentation 1. Introduction In recent years, technologies for computer-assisted language learning (CALL), such as automated speaking assessment (ASA), have made significant strides to meet the growing demand for scalable and objective evaluation of second-language (L2) speaking proficiency in both academic and professional contexts [1, 2, 3].
M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models
Li, Zexuan, Dai, Hongliang, Li, Piji
For Relation Extraction (RE), the manual annotation of training data may be prohibitively expensive, since the sentences that contain the target relations in texts can be very scarce and difficult to find. It is therefore beneficial to develop an efficient method that can automatically extract training instances from unlabeled texts for training RE models. Recently, large language models (LLMs) have been adopted in various natural language processing tasks, with RE also benefiting from their advances. However, when leveraging LLMs for RE with predefined relation categories, two key challenges arise. First, in a multi-class classification setting, LLMs often struggle to comprehensively capture the semantics of every relation, leading to suboptimal results. Second, although employing binary classification for each relation individually can mitigate this issue, it introduces significant computational overhead, resulting in impractical time complexity for real-world applications. Therefore, this paper proposes a framework called M-BRe to extract training instances from unlabeled texts for RE. It utilizes three modules to combine the advantages of both of the above classification approaches: Relation Grouping, Relation Extraction, and Label Decision. Extensive experiments confirm its superior capability in discovering high-quality training samples from unlabeled texts for RE.
Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition
Nolan, Matthew, Yao, Lina, Davidson, Robert
Human Activity Recognition (HAR) has seen significant advancements with the adoption of deep learning techniques, yet challenges remain in terms of data requirements, reliability and robustness. This paper explores a novel application of Ensemble Distribution Distillation (EDD) within a self-supervised learning framework for HAR aimed at overcoming these challenges. By leveraging unlabeled data and a partially supervised training strategy, our approach yields an increase in predictive accuracy, robust estimates of uncertainty, and substantial increases in robustness against adversarial perturbation; thereby significantly improving reliability in real-world scenarios without increasing computational complexity at inference. We demonstrate this with an evaluation on several publicly available datasets. The contributions of this work include the development of a self-supervised EDD framework, an innovative data augmentation technique designed for HAR, and empirical validation of the proposed method's effectiveness in increasing robustness and reliability.
Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space
Sepanj, M. Hadi, Ghojogh, Benyamin, Fieguth, Paul
Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives--such as invariance to augmentations, variance preservation, and feature decorrelation--without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that lifts the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss-variance, invariance, and covariance--we obtain a general formulation that operates on double-centered kernel matrices and Hilbert-Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg not only avoids representational collapse but also improves performance on tasks with complex or small-scale data. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations further confirm that kernel-based embeddings exhibit better isometry and class separation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.