augmentation
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
Knowledge Distillation (KD) aims to train a lightweight student model by transferring knowledge from a large, high-capacity teacher. Recent studies have shown that leveraging diverse teacher perspectives can significantly improve distillation performance; however, achieving such diversity typically requires multiple teacher networks, leading to high computational costs. In this work, we propose a novel cost-efficient knowledge augmentation method for KD that generates diverse multiviews by attaching multiple branches to a single teacher. To ensure meaningful semantic variation across multi-views, we introduce two angular diversity objectives: 1) constrained inter-angle diversify loss, which maximizes angles between augmented views while preserving proximity to the original teacher output, and 2) intra-angle diversify loss, which encourages an even distribution of views around the original output. The ensembled knowledge from these angularly diverse views, along with the original teacher, is distilled into the student. We further theoretically demonstrate that our objectives increase the diversity among ensemble members and thereby reduce the upper bound of the ensemble's expected loss, leading to more effective distillation. Experimental results show that our method surpasses an existing knowledge augmentation method across diverse configurations. Moreover, the proposed method is compatible with other KD frameworks in a plug-and-play fashion, providing consistent improvements in generalization performance.
CaliGCL: Calibrated Graph Contrastive Learning via Partitioned Similarity and Consistency Discrimination
Graph contrastive learning (GCL) aims to learn self-supervised representations by distinguishing positive and negative sample pairs generated from multiple augmented graph views. Despite showing promising performance, GCL still suffers from two critical biases: (1) Similarity estimation bias arises when feature elements that support positive pair alignment are suppressed by conflicting components within the representation, causing truly positive pairs to appear less similar.
Cross-Domain Graph Data Scaling: AShowcase with Diffusion Models
Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This'better with more' phenomenon enables the effectiveness of large-scale pre-training on vast amounts of data. However, current graph pre-training methods struggle to scale up data due to heterogeneity across graphs. To achieve effective data scaling, we aim to develop a general model that is able to capture diverse data patterns of graphs and can be utilized to adaptively help the downstream tasks. To this end, we propose UniAug, a universal graph structure augmentor built on a diffusion model. We first pre-train a discrete diffusion model on thousands of graphs across domains to learn the graph structural patterns. In the downstream phase, we provide adaptive enhancement by conducting graph structure augmentation with the help of the pre-trained diffusion model via guided generation. By leveraging the pre-trained diffusion model for structure augmentation, we consistently achieve performance improvements across various downstream tasks in a plug-and-play manner. To the best of our knowledge, this study represents the first demonstration of a data-scaling graph structure augmentor on graphs across domains.
Adv-SSL: Adversarial Self-Supervised Representation Learning with Theoretical Guarantees
Learning transferable data representations from abundant unlabeled data remains a central challenge in machine learning. Although numerous self-supervised learning methods have been proposed to address this challenge, a significant class of these approaches aligns the covariance or correlation matrix with the identity matrix. Despite impressive performance across various downstream tasks, these methods often suffer from biased sample risk, leading to substantial optimization shifts in mini-batch settings and complicating theoretical analysis. In this paper, we introduce a novel Adversarial Self-Supervised Representation Learning (AdvSSL) for unbiased transfer learning with no additional cost compared to its biased counterparts. Our approach not only outperforms the existing methods across multiple benchmark datasets but is also supported by comprehensive end-to-end theoretical guarantees. Our analysis reveals that the minimax optimization in AdvSSL encourages representations to form well-separated clusters in the embedding space, provided there is sufficient upstream unlabeled data. As a result, our method achieves strong classification performance even with limited downstream labels, shedding new light on few-shot learning.
Learning Without Augmenting: Unsupervised Time Series Representation Learning via Frame Projections
Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data. Most SSL approaches rely on strong, well-established, handcrafted data augmentations to generate diverse views for representation learning. However, designing such augmentations requires domainspecific knowledge and implicitly imposes representational invariances on the model, which can limit generalization. In this work, we propose an unsupervised representation learning method that replaces augmentations by generating views using orthonormal bases and overcomplete frames. We show that embeddings learned from orthonormal and overcomplete spaces reside on distinct manifolds, shaped by the geometric biases introduced by representing samples in different spaces. By jointly leveraging the complementary geometry of these distinct manifolds, our approach achieves superior performance without artificially increasing data diversity through strong augmentations. We demonstrate the effectiveness of our method on nine datasets across five temporal sequence tasks, where signalspecific characteristics make data augmentations particularly challenging. Without relying on augmentation-induced diversity, our method achieves performance gains of up to 15-20% over existing self-supervised approaches.
aa5642fb7d78a1bca9ceba3d8bd564f4-Paper-Conference.pdf
The application of machine learning (ML) to electroencephalography (EEG) has great potential to advance both neuroscientific research and clinical applications. However, the generalisability and robustness of EEG-based ML models often hinge on the amount and diversity of training data. It is common practice to split EEG recordings into small segments, thereby increasing the number of samples substantially compared to the number of individual recordings or participants. We conceptualise this as a multi-level data generation process and investigate the scaling behaviour of model performance with respect to the overall sample size and the participant diversity through large-scale empirical studies. We then use the same framework to investigate the effectiveness of different ML strategies designed to address limited data problems: data augmentations and self-supervised learning. Our findings show that model performance scaling can be severely constrained by participant distribution shifts and provide actionable guidance for data collection and ML research. The code for our experiments is publicly available online.1
Safety Depth in Large Language Models: AMarkov Chain Perspective
Large Language Models (LLMs) are increasingly adopted in high-stakes scenarios, yet their safety mechanisms often remain fragile. Simple jailbreak prompts or even benign fine-tuning can bypass internal safeguards, underscoring the need to understand the failure modes of current safety strategies. Recent findings suggest that vulnerabilities emerge when alignment is confined to only the initial output tokens. To address this, we introduce the notion of safety depth, a designated output position where the model refuses to generate harmful content. While deeper alignment appears promising, identifying the optimal safety depth remains an open and underexplored challenge.
Hybrid-Collaborative Augmentation and Contrastive Sample Adaptive-Differential Awareness for Robust Attributed Graph Clustering
Due to its powerful capability of self-supervised representation learning and clustering, contrastive attributed graph clustering (CAGC) has achieved great success, which mainly depends on effective data augmentation and contrastive objective setting. However, most CAGC methods utilize edges as auxiliary information to obtain node-level embedding representation and only focus on node-level embedding augmentation. This approach overlooks edge-level embedding augmentation and the interactions between node-level and edge-level embedding augmentations across various granularity. Moreover, they often treat all contrastive sample pairs equally, neglecting the significant differences between hard and easy positivenegative sample pairs, which ultimately limits their discriminative capability. To tackle these issues, a novel robust attributed graph clustering (RAGC), incorporating hybrid-collaborative augmentation (HCA) and contrastive sample adaptivedifferential awareness (CSADA), is proposed. First, node-level and edge-level embedding representations and augmentations are simultaneously executed to establish a more comprehensive similarity measurement criterion for subsequent contrastive learning.
Understanding Contrastive Learning via Gaussian Mixture Models
Contrastive learning involves learning representations via a loss function that encourages each (unlabeled) sample to be far from other samples, but close to its own augmentation. In this paper, we aim to understand why this simple idea performs remarkably well, by theoretically analyzing it for a simple, natural problem setting: dimensionality reduction in Gaussian Mixture Models (GMMs). Note that the standard GMM setup lacks the concept of augmentations. We study an intuitive extension: we define the pair of data sample and its augmentation as a coupled random draw from the GMM such that the marginal over the "noisy" augmentation is biased towards the component of the data sample. For this setup, we show that vanilla contrastive loss, e.g., InfoNCE, is able to find the optimal lower-dimensional subspace even when the Gaussian components are non-isotropic. In particular, we show that InfoNCE can match the performance of a fully supervised algorithm, e.g., LDA, (where each data point is labeled with the mixture component it comes from) even when the augmentations are "noisy". We further extend our setup to the multi-modal case, and develop a GMM-like setting to study the contrastive CLIP loss. We corroborate our theory with experiments on CIFAR100; representations learned by InfoNCE loss match the performance of LDA on clustering metrics.
Puzzles: Unbounded Video-Depth Augmentation for Scalable End-to-End 3DReconstruction
Multi-view 3D reconstruction remains a core challenge in computer vision. Recent methods, such as DUSt3R and its successors, directly regress pointmaps from image pairs without relying on known scene geometry or camera parameters. However, the performance of these models is constrained by the diversity and scale of available training data. In this work, we introduce Puzzles, a data augmentation strategy that synthesizes an unbounded volume of high-quality, posed video-depth data from just a single image or video clip. By simulating diverse camera trajectories and realistic scene geometry through targeted image transformations, Puzzles enhances data variety. Extensive experiments show that integrating Puzzles into existing video-based 3D reconstruction pipelines consistently boosts performance, all without modifying the underlying network architecture. Notably, models trained on only 10%of the original data, augmented with Puzzles, still achieve accuracy comparable to those trained on the full dataset.[Project